awslabs / multi-domain-goal-oriented-dialogues-dataset

Data from the publication "Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data"
Other
21 stars 3 forks source link

Conversation IDs do not match between unannotated and paper_splits #3

Open scottmackieverint opened 2 years ago

scottmackieverint commented 2 years ago

The conversationId column in the unannotated dataset does not match the conversationIds that appear in the paper_splits dataset.

It looks like the conversationIds always end with a suffix of -1 or -2 in the unannotated dataset. It looks like conversations that end in -2 are duplicates of those that end in -1.

Ideally the conversation ID would match between the unannotated and paper_splits datsets.