PolyAI-LDN / conversational-datasets

Large datasets for conversational AI
Apache License 2.0
1.29k stars 167 forks source link

Using pytorch #48

Closed vmurahari3 closed 5 years ago

vmurahari3 commented 5 years ago

Was wondering if you any ideas on parsing these tf.Record files in pytorch?

matthen commented 5 years ago

What would be a more convenient data format? We could add support for something more generic, like text files with json objects one per line

{'context/1': "Hello, how are you?", 'context/0': "I am fine. And you?", 'context': "Great. What do you think of the weather?", 'response': "It doesn't feel like February."}
{'context/0': "I am Matt", 'context': "Nice to meet you", 'response': "Nice to meet you too."}
vmurahari3 commented 5 years ago

That would be so wonderful. Json files will be amazing :)

matthen commented 5 years ago

I introduced JSON format in this PR:

https://github.com/PolyAI-LDN/conversational-datasets/pull/49

Just run add --dataset_format JSON when calling create_data.py