DiscourseDB / discoursedb-core

DiscourseDB Core Repository
GNU General Public License v2.0
10 stars 4 forks source link

spurious discourse relation set on CSV import #31

Open cbogart opened 6 years ago

cbogart commented 6 years ago

One record had a "parent" contribution even though no parent column existed in the csv file

cbogart commented 6 years ago

Figured out why: CSV import by default assumes that each sequential posting in the same forum is a reply to the previous one. (This is a good assumption in some cases, like the "crito" dataset, but bad in others). In this case, there were a bunch of independent test answers with scores imported, and if two were classified the same way in sequence, they got treated as posting and reply.

Workaround: Create a blank 'replyto' column in a CSV for import where you do not want default reply structure to be inferred.