IBM / multidoc2dial

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
Apache License 2.0
67 stars 22 forks source link

Question about data preprocessing #8

Open KristenZHANG opened 2 years ago

KristenZHANG commented 2 years ago

Hi, I have a question about the data preprocessing (data_preprocessor.py) and want to seek some help: When calling the rm_blank of grounding, why the is_shorten is set as True? Thanks!

image

image

songfeng commented 2 years ago

Hi,

Thanks for the question!

In some rare cases, we found that certain grounding content couldn't be matched with a passage even though we could find one when searching manually. Then, it seemed that shorten the grounding content by a little bit could help with the odd issue. So, it is sort of a not-so-elegant workaround that we applied.

Best, Song