I get the Robust04 dataset file as follow:
1.TREC-Disk-4.tar.gz
2.TREC-Disk-5.tar.gz
and while unzip those files, I get a lot of files named "LAL010289" which contains many documents with HTML labels(,).
Could you give me some advice on what should I do next?
Should I move the document files to the same file folder and install the Indri engine to index them?
Thank you very much!
I get the Robust04 dataset file as follow: 1.TREC-Disk-4.tar.gz 2.TREC-Disk-5.tar.gz and while unzip those files, I get a lot of files named "LAL010289" which contains many documents with HTML labels(,).
Could you give me some advice on what should I do next? Should I move the document files to the same file folder and install the Indri engine to index them? Thank you very much!