chen700564 / RGB

Other
261 stars 22 forks source link

How you create the data #7

Open Linda230 opened 8 months ago

Linda230 commented 8 months ago

Your work is very excellent. I would like to know how you create your data, for example, how the "en_fact.json" is created, I noticed that there are positive and negative samples, how these samples are created, is it created manually or just automatically.

Looking forward to receiving your reply.

chen700564 commented 8 months ago

The queries and answers are generated by gpt-3.5-turbo and then manually filtered and adjusted. The document is retrieved using Google api (obtain the website) and dense retriever (get the top30 passage in all websites).

In all data, the negative doucments means the doucments that do not contain the answer text and positive documents will contain the answer text.

For counterfactual robustness data, such as zh_fact.json, we manually modify the answers and replace the answer text in retrieved positive documents to construct "positive_wrong" key.

Linda230 commented 8 months ago

Thank you so much for your help.