Closed AGTSAAA closed 8 months ago
Thank you for your interest. Unfortunately, Box does not offer an official download package similar to Google's gdown tool. My Google Drive does not have sufficient space to store all the data. However, I've found an unofficial solution that might be useful for you. You can check out this repository: https://github.com/wuhanstudio/box-api-dl/tree/main. The creator is also working on downloading large datasets from Box onto a server. I hope this information is helpful to you.
Thanks for your reply. I hava another question about the following code. Why did you only build the prompt for product dataset. Where can I find the prompt for other datasets such as arxiv, cora, and pubmed.
Thank you very much
The node classification prompts for the other three datasets can be located within the loaded "sampled{hop}{size}_train.jsonl" file, specifically in the l["conversations"][0]["value"]
field. Due to the relatively small size of these datasets, we stored the prompt for each sample directly in the file. The prompt is very similar with products.
Thanks for your prompt reply. My issues are solved
Hi Runjin,
Thank you very much for your help. I have successfully trained the model, but I noticed the inference time to be very slow, which may be because you conducted the inference on one sample each time. Can you please tell me how to improve the inference speed?
The reason we conduct inference on a single sample at a time is to emulate an interactive system where users submit one question per interaction.
To enhance inference speed, batch processing can be utilized. Moreover, in our testing tasks, since all samples within a single inference task share a common prompt, we have the opportunity to tokenize the prompt just once. I have prepared a naive version of batch processing for you in the 'naive_branch' branch, which you are welcome to use. Thank you.
Thanks!
Hi, authors
Thank you for your work.
Could you please tell me how to direcly download the data via the Linux server?
Currently, I need to firstly download the dataset in box to local computer, which is very time-consuming.