I notice that "split_data()" in the script of "hgcn/utils/data_utils.py" sperates dataset into "pos" and "neg" for sampling trn-val-tst uniformly. It is ok for the dataset: "DISEASE" containing only 2 classes but not ok for dataset: "AIRPORT" containing 4 classes. But I notice that in the function of "load_data_nc()" wrote in "hgcn/utils/data_utils.py" use "split_data()" to create splits for both "DISEASE" and "AIRPORT" which may results in 1) repeated samlples in each split and 2) overlapping between splits.
I notice that "split_data()" in the script of "hgcn/utils/data_utils.py" sperates dataset into "pos" and "neg" for sampling trn-val-tst uniformly. It is ok for the dataset: "DISEASE" containing only 2 classes but not ok for dataset: "AIRPORT" containing 4 classes. But I notice that in the function of "load_data_nc()" wrote in "hgcn/utils/data_utils.py" use "split_data()" to create splits for both "DISEASE" and "AIRPORT" which may results in 1) repeated samlples in each split and 2) overlapping between splits.