Closed alicranck closed 6 years ago
Hi @alicranck thanks for your interest in our project!
Yes, you can, if you pre-process the data into the same input format.
Please take a look here as an example. The associated Python class to process this is at https://github.com/castorini/Castor/blob/master/datasets/sick.py.
MP-CNN takes pairs of texts as input. They are stored in a.toks
and b.toks
in the example above. You need an id and label for each pair, which are stored in id.txt
and sim.txt
respectively.
You need to build the dataset reader and processor using torchtext for your own dataset. You can follow: https://github.com/castorini/Castor/blob/master/datasets/trecqa.py.
Then add that to https://github.com/castorini/Castor/blob/master/common/dataset.py.
We will build a doc for adding new dataset soon. Thanks!
Generating a.toks, b.toks files like in the sick dataset did the job. Thanks!
Hi,
Is there a possibility to use the models with your own data (specifically the mp-cnn)? I couldn't find anything in the documentation.
Thanks!