Open stedes opened 6 years ago
I think the regex expression is wrong.
TOKENS_ALPHANUMERIC = '[A-Za-z0-9]+(?=\s+)'
Doesn't this mean that you only consider tokens if they contain only alphanumeric characters and are followed by white space ?
Example: WORD1,WORD2, WORD3, WORD4 Word5
In the above sentence WORD4 and Word5 would be considered as tokens as the other words have a comma in them and as such are not valid tokens.
I think all the WORDS will be considered as tokens. The first four will be split on the comma, and the fifth will be split on the whitespace.
I think the regex expression is wrong.
TOKENS_ALPHANUMERIC = '[A-Za-z0-9]+(?=\s+)'
Doesn't this mean that you only consider tokens if they contain only alphanumeric characters and are followed by white space ?
Example: WORD1,WORD2, WORD3, WORD4 Word5
In the above sentence WORD4 and Word5 would be considered as tokens as the other words have a comma in them and as such are not valid tokens.