andreeaiana / newsreclib

PyTorch-Lightning Library for Neural News Recommendation
https://newsreclib.readthedocs.io/en/latest/
MIT License
41 stars 8 forks source link

How use customized dataset? #1

Closed chiyuzhang94 closed 5 months ago

chiyuzhang94 commented 9 months ago

Hi @andreeaiana ,

Thanks for this great work!

I wonder if I can run models with a customized dataset or not. Any guidance?

Best, Chiyu

andreeaiana commented 8 months ago

Hi Chiyu,

You can run the implemented models with a customized dataset. For this, you would have to do the following:

Let me know if you have any more questions.

Best, Andreea

chiyuzhang94 commented 8 months ago

Hi @andreeaiana ,

Thanks for the explanation. I wonder if you can provide a sample of processed dataset?

andreeaiana commented 8 months ago

I sent you an email with samples of the processed dataset.

chiyuzhang94 commented 8 months ago

Hi @andreeaiana ,

I wonder how you split the development and testing set. Where I can find the code for this? Particularly, I want to use my split of MIND dataset instead of downloading the original file and split. I can prepare a dataset to match the original format of MIND dataset. I would like to know how to pass my prepared data and splits to your tool.

andreeaiana commented 8 months ago

I split the MIND dataset in the mind_dataframe.py script.

chiyuzhang94 commented 8 months ago

Thanks.

Should I change these lines (1, 2) to be data_dir=to_my_file_path

Should I give the root path to the path to each tsv file?

andreeaiana commented 8 months ago

Yes, you can change the data_dir=my_file_path either in the configuration of your datamodule or that of your experiment.