Closed Chenfeng1271 closed 1 year ago
The tokens being out of order is actually kind of important for performance:
Do you actually need the tokens to be in order during the execution of the network, or do you just need them in order at the end of the network? If you only need them in order at the end, you can use the source tracing feature (see the timm implementation) to get a mapping of which input tokens ended up where. Then you can use that to sort the output tokens.
Thank you for your reply. I consider the span merge may make sense since my downstream task is named entity recognition(NER) which detects entity span in a sentence like ’Bill Gates‘ and ’Microsoft‘ are PERSON and ORGANIZATION in 'Bill Gates found Microsoft'. In that way, the bert becomes a kind of FCN. I can unmerge it to have an order sequence at the end of the network, but I think keeping order during execution may be necessary which is equivalent to NER. Therefore, I would like to enable to merge the nearby tokens to itself along the diagonal. I tested the TOME in Bert+CRF, and the result is similar to that in ImageNet, which has a little drop (0.3% f1 score), so I want to test whether it needs some specific partition style and consider contrastive learning with augmentation may be helpful.
Hi, thank you for your work. Now I am trying it to bert as token merge. I want to keep the token order during merge, but I meet a problem since my programming skill is poor...Can you give me some hints to do so? Thank you.