Tokenized story text - Githubissues

Maluuba / newsqa

Tools for using Maluuba's NewsQA Dataset (public version)

https://www.microsoft.com/en-us/research/project/newsqa-dataset/

Other

253 stars 58 forks source link

Tokenized story text #38

Closed ryanpram closed 3 years ago

ryanpram commented 3 years ago

Hi, how can i get the tokenized story? in newsqa-data-tokenized-v1.csv only have fields: question, answer_char_ranges , is_answer_absent,is_question_bad,story_text,answer_token_ranges ,sentence_starts.

i need something like tokenized_story_text

Thanks

juharris commented 3 years ago

Thanks for trying out NewsQA! Sorry about that, it should be better documented. Looking at the code, it looks like story_text has the tokens separated by spaces.

ryanpram commented 3 years ago

ahh i see, thank you @juharris .