amazon-science / tgl

Apache License 2.0
192 stars 31 forks source link

Question about the int_roll in edges.csv #3

Closed RManLuo closed 2 years ago

RManLuo commented 2 years ago

First thanks for the wonderful work you have done. I have a question related to the data format hoping you can help.

As described in the README, the edges.csv should follow the format of ,src,dst,time,ext_roll. However, the edges.csv in LastFM and Mooc follows the format of ,src,dst,time,int_roll,ext_roll. Thus, I want to know what is the meaning of the int_roll here?

tedzhouhk commented 2 years ago

Thanks for your interest in our work. The int_roll column is for the interpolation setting (identifying missing edges in the same time period as the training set) which is not used in this work.

RManLuo commented 2 years ago

Thanks for your answer. It is really helpful for me to use your framework on the customized dataset.

UCRajkumar commented 1 year ago

Hello, I am still a bit confused by the int_roll column. The README says we need only src, dst, timestamp, and ext_roll columns. However, when I run gen_graph.py on a edges.csv file without the int_roll column, it throws an error. So is the int_roll column required?

roman-4erkasov commented 1 year ago

Hello, I am still a bit confused by the int_roll column. The README says we need only src, dst, timestamp, and ext_roll columns. However, when I run gen_graph.py on a edges.csv file without the int_roll column, it throws an error. So is the int_roll column required?

I agree with @UCRajkumar .

Dear @tedzhouhk, In addition, I would notice that there is "int_roll" in file "gen_graph.py". Can I simply delete the following rows to use "gen_graph.py"?

if row['int_roll'] == 0:
        int_train_indices[src].append(dst)
        int_train_ts[src].append(row['time'])
        int_train_eid[src].append(idx)
        if args.add_reverse:
            int_train_indices[dst].append(src)
            int_train_ts[dst].append(row['time'])
            int_train_eid[dst].append(idx)
        # int_train_indptr[src + 1:] += 1
    if row['int_roll'] != 3:
        int_full_indices[src].append(dst)
        int_full_ts[src].append(row['time'])
        int_full_eid[src].append(idx)
        if args.add_reverse:
            int_full_indices[dst].append(src)
            int_full_ts[dst].append(row['time'])
            int_full_eid[dst].append(idx)
        # int_full_indptr[src + 1:] += 1

Thank you!

tedzhouhk commented 1 year ago

Hi @roman-4erkasov. Yes, you can delete these (also later codes to generate int_train.npz and int_full.npz). Sorry I don't have permission to push to this repo, otherwise I can push a fix.