Open xuziweiwh opened 1 month ago
Thank you for your interest in our work. The code repository contains only Cora datasets.
NeutronOrch needs 4 dataset files to run:
your_dateset.edge
,a binary edge list file, used to store the graph structure.your_dateset.feat
,contains the feature of each node, the first number in each line indicates the node number, followed by the feature of the node.your_dateset.label
,contains the label of each node, the first number in each line indicates the node number, followed by the classification number of the node.your_dateset.mask
, contains the mask of each node, the first number in each line indicates the node number, followed by the mask of node (train, val, test).We provide a python script to convert some commonly used datasets, please refer to data/generate_nts_dataset.py
for details.
If you have any other questions please let us know.
Thank you for your help. I successfully downloaded several datasets using your method. However, when I checked the contents of the datasets, I found some errors. For example, when I re-downloaded the Cora dataset, an unknown error appeared in its mask file. Normally, the mask file does not have an unknown error. I re-downloaded the dataset several times, and I found that the error persisted. Could this be an error in the generate_nts_dataset.py
function? The error message is as follows:
640 unknown 641 unknown......1707 unknown.
I look forward to your reply!😊😊😊
Thank you for your help. I successfully downloaded several datasets using your method. However, when I checked the contents of the datasets, I found some errors. For example, when I re-downloaded the Cora dataset, an unknown error appeared in its mask file. Normally, the mask file does not have an unknown error. I re-downloaded the dataset several times, and I found that the error persisted. Could this be an error in the
generate_nts_dataset.py
function? The error message is as follows: 640 unknown 641 unknown......1707 unknown. I look forward to your reply!😊😊😊
Hi, this is a normal behavior and does not affect the program’s execution. We use DGL and OGB to download GNN datasets, and convert them into the format required by NeutronOrch using generate_nts_dataset.py.
Not all vertices in the Cora dataset have labels. Please refer to the dataset description here: CoraGraphDataset, (Train: 140, Valid: 500, Test: 1000).
For vertices without labels, we manually mark them as “Unknown”. You can check the specific code here: code link
If you have any further questions, feel free to ask!
Hello, I would like to ask, does the code run successfully as soon as the following content appears? Will there be a comparison chart when the code runs? Looking forward to your reply.
Yes, NTO is working fine. More detailed output can be viewed in the log folder. You can adjust the hot vertices computation (i.e., "CACHE_RATE") to reduce CPU computation time and improve performance.
Okay, thank you very much for your help in successfully running multiple datasets. Wishing you a happy life and greater academic achievements!🥳🥳🥳
Hello, regarding the two example configurations mentioned in the README file, gcn_reddit_sample.cfg and gcn_cora_sample.cfg. I can run the latter normally, but the path for the former shows an issue. Should I download this dataset myself from the internet? Will there be graphs when running these examples? I apologize, I just learned this, I have a lot of questions. Looking forward to your reply.🤓🤓🤓 The error is as follows: nts:/home/xxx/desktop/Sample-based-GNN-main/dep/gemini/filesystem.hpp:34: long int file_size(std::string): Assertion failed.stat(filename.c_str(),&st) == 0.