Failure to Download the Correct ML_10M Dataset

zhiqiuyuan commented 3 months ago

Hi,

I'm trying to reproduce the results on the ML_10M dataset. I followed the link in README.md to download the dataset. However, I found that there is only one 'graphs.npz' file provided for this dataset. The file 'ia-movielens-user2tags-10m.edges' needed in all_data/movie/preprocess.py is missing.

Additionally, I tried commenting out the code dependent on this missing file and running the rest of the code in preprocess.py (which seems to depend only on the 'graphs.npz' file). However, I received the following error:

File "xxx/SimpleDyG/all_data/movie/preprocess.py", line 148, in <module>
    graphs = np.load("graphs.npz", allow_pickle=True)['graph']
...
UnicodeError: Unpickling a python object failed: UnicodeDecodeError('ascii', b'\x07\xd6\x02\x0f\x0b "\x00\x00\x00', 1, 2, 'ordinal not in range(128)')
You may need to pass the encoding= option to numpy.load

I tried all possible values for the encoding argument to numpy.load, but I still received the following error:

networkx.exception.NetworkXError: Input is not a correct NetworkX graph.

This indicates that the 'graphs.npz' file may not be in the correct format expected by the program.

Therefore, Could you upload the ML_10M dataset that all_data/movie/preprocess.py accepts, or update the link to download it? Thank you very much!

YuxiaWu commented 3 months ago

@zhiqiuyuan Hi, thanks for your interest. I can load the graphs.npz file on my machine. It might be an environmental issue since others have encountered this problem as well.

I appreciate your reminder. I have uploaded the preprocessed data files for all datasets in the all_data/. Additionally, the processed folders (./resources, ./tokenizers, and ./vocabs) for all datasets are also available.

If you encounter any other issues while running the code, please feel free to reach out.

zhiqiuyuan commented 3 months ago

@zhiqiuyuan Hi, thanks for your interest. I can load the graphs.npz file on my machine. It might be an environmental issue since others have encountered this problem as well.

I appreciate your reminder. I have uploaded the preprocessed data files for all datasets in the all_data/. Additionally, the processed folders (./resources, ./tokenizers, and ./vocabs) for all datasets are also available.

If you encounter any other issues while running the code, please feel free to reach out.

Thanks! That's so kind of you!

YuxiaWu / SimpleDyG

Failure to Download the Correct ML_10M Dataset #5