Closed luisfredgs closed 4 years ago
Yes, we used BERT only as the subword tokenizer. We will release the codes and the labeled data to help quick implement, hopefully in the next week.
The manner you provide rich semantic embeddings is a great insight. Thanks for sharing the source code.
Yes, we used BERT only as the subword tokenizer.
Hi, @cooelf. I think I am a bit confused by this. The paper seems to imply that BERT is used not just to tokenize words into subwords, but also to get contextualized representations for the subwords. For example, you have this figure, which shows the interactions between tokens:
Then there is also the following passage in the paper (I added bolding):
The raw text sequences and semantic role label sequences are firstly represented as embedding vectors to feed a pre-trained BERT. The input sentence X = {x1, . . . , xn} is a sequence of words of length n, which is first tokenized to word pieces (subword tokens). Then the transformer encoder captures the contextual information for each token via self-attention and produces a sequence of contextual embeddings.
Can you clarify, @cooelf?
Your study is an interesting contribution. I have a shallow question: Do you have used BERT only as a tokenizer in a subword level? Anyway, as the source code will give more details about, i'll appreciate if you turn it avaliable for us.
Thank you for sharing you work.