Closed dingquanyu closed 1 year ago
I just check the data, and cannot find the problem. In train_multi_label.json, it has:
"7a6o_AAA": [
"7a6o_AAA",
"7a6o_A"
"7a6o_BBB": [
"7a6o_B",
"7a6o_BBB"
In pdb_labels, it has:
7a6o_A.label.pkl.gz
7a6o_AAA.label.pkl.gz
7a6o_B.label.pkl.gz
7a6o_BBB.label.pkl.gz
In pdb_features, it has:
7a6o_AAA.feature.pkl.gz
7a6o_BBB.feature.pkl.gz
I see. Sorry since downloading the full dataset never worked for us, I prepared all the features by myself and I extracted the chain names from mmcif files, which gave me 7a6o_A and 7a6o_B instead of 7a6o_AAA and 7a6o_BBB. Now it makes sense. Thanks for checking it.
@henrywotton you now can download the full dataset from ByteDance hosted storage, check the README.
Hi,
I wonder how you generated the train_multi_label.json but there seem to be errors in this file. For example, 7a6o has two chains : A and B in pdb but in this file it's labeled as 7a6o_AAA and 7a6o_BBB. This and other similar mislabellings have given me errors. Could you maybe upload the script that generated this json file? Thanks.