Closed mboedigh closed 3 years ago
Hi @mboedigh, the stop token is not included in the sequence representation. We take token representations until len(seq)+1 (and not +2) for that reason.
Sorry, something I still don't understand if tokens are [0,5,5,2] for a sequence = 'AA', where 0 and 2 are the begin and end tokens then tokens[len(seq)+1] will index the 2, This is the stop token, right? tokens[len(seq)+2] is out of bounds
In the code example you posted, seq_i
corresponds to the original sequence (len x), but token_representations[i].size(0) == x+2
.
Does this code example help to clear things up?
>>> begin, end = 0, 2
>>> seq = [5,5]
>>> tokens = [begin] + seq + [end]
>>> tokens[1 : len(seq) + 1]
[5, 5]
yes. thanks! I guess python 1:x is not like some other languages. I assumed too much, but also I was somehow getting 'out of bounds errors' in my own tests before I posted. thanks again.
The example code in the Quick Start section of the github readme page shows this excerpt:
The sequence_representations will then include the last position of token_representations, which appears to be the stop token. Is this intended?