facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
931 stars 67 forks source link

About keeping token order #9

Closed Chenfeng1271 closed 1 year ago

Chenfeng1271 commented 1 year ago

Hi, thank you for your work. Now I am trying it to bert as token merge. I want to keep the token order during merge, but I meet a problem since my programming skill is poor...Can you give me some hints to do so? Thank you.

dbolya commented 1 year ago

The tokens being out of order is actually kind of important for performance: image

Do you actually need the tokens to be in order during the execution of the network, or do you just need them in order at the end of the network? If you only need them in order at the end, you can use the source tracing feature (see the timm implementation) to get a mapping of which input tokens ended up where. Then you can use that to sort the output tokens.

Chenfeng1271 commented 1 year ago

Thank you for your reply. I consider the span merge may make sense since my downstream task is named entity recognition(NER) which detects entity span in a sentence like ’Bill Gates‘ and ’Microsoft‘ are PERSON and ORGANIZATION in 'Bill Gates found Microsoft'. In that way, the bert becomes a kind of FCN. I can unmerge it to have an order sequence at the end of the network, but I think keeping order during execution may be necessary which is equivalent to NER. Therefore, I would like to enable to merge the nearby tokens to itself along the diagonal. I tested the TOME in Bert+CRF, and the result is similar to that in ImageNet, which has a little drop (0.3% f1 score), so I want to test whether it needs some specific partition style and consider contrastive learning with augmentation may be helpful.