IBM / Multi-GNN

Multi-GNN architectures for Anti-Money Laundering.
Apache License 2.0
36 stars 15 forks source link

Why is Test Set Assigned as the Whole Dataset? #9

Open Linnore opened 1 week ago

Linnore commented 1 week ago

I am organising the data preprocessing codes of this repo into a PyG's Dataset object. In the data splitting part of data_loading.py, the test_inds is never used and the test set is assigned as the entire dataset, line 100-106:

    te_edge_index,  te_edge_attr,  te_y,  te_edge_times  = edge_index,          edge_attr,        y,        timestamps

    ...

    te_data = GraphData (x=te_x,  y=te_y,  edge_index=te_edge_index,  edge_attr=te_edge_attr,  timestamps=te_edge_times )

Is this an intentional design of data splitting? We should split the dataset 0.6/0.2/0.2 right?

Linnore commented 1 week ago

Also not that the edges used for validation include both tr_inds and val_inds:


    e_val = np.concatenate([tr_inds, val_inds])
    ...