IBM / multidoc2dial

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents
Apache License 2.0
67 stars 22 forks source link

About data download #12

Closed wutong4012 closed 2 years ago

wutong4012 commented 2 years ago

It seems that the link 404 cannot download the data. Where can I download the data?

dp0d commented 2 years ago

Hello, looking for the dataset? you can try to clone this project and use the script to download

git clone https://github.com/IBM/multidoc2dial.git
cd multidoc2dial/scripts
./run_download.sh

or I am available to provide it for you for free, just downloaded last night. If you need, just let me know and provide your email address.

dp0d commented 2 years ago

I find some noisies in this dataset, which could be the reason they canceled the way to download their dataset, the id of the docs are not unique, two docs with different id may have the same content, which is probably caused by their Crawler program.

wutong4012 commented 2 years ago

Hello, looking for the dataset? you can try to clone this project and use the script to download

git clone https://github.com/IBM/multidoc2dial.git
cd multidoc2dial/scripts
./run_download.sh

or I am available to provide it for you for free, just downloaded last night. If you need, just let me know and provide your email address.

Thank you very much for your help, my email is wt1102310705@gmail.com. Regarding the problem of data noise, I will study it in detail later.

songfeng commented 2 years ago

Sorry, the website was temporarily down. Now it is up again.

You can also find the dataset at https://github.com/doc2dial/multidoc2dial/tree/main/file .