Closed SakuraRiven closed 4 years ago
Hi. For question 1: Yes, it's right. In fact, drugA-drugB is the same as drugB-drugA in this project, we just delete half of interactions to reduce the replication. However, if you try to use Knowledge Graph, the order will make a difference. It means (drugA,relation,drugB) is different from (drugB,relation,drugA). We need to determine the order with dependency relationship. For question 2: The original fingerprint is 881 dimension. We perform encoding again toward the results of the 881 dimension again so it results in 583. You can have a look on the table "drug" and column "smile". The maximum number is 881.
If we want to reproduce the results in your paper or develop our own method, should the dataset keep 37264 in this repo or aug to 74528? But the latter may cause a different split and test set ...
I think you should keep 37264, because it may be easier for the model to predict (drugB,drugA) if (drugA,drugB) already exists in the training set. This will lead to false high accuracy.
OK, understand. Thanks for your answer~
Hi, thanks for your kind reply. I have two questions:
The number of interactions in your paper is 74528 while the data in the code is 37264. I guess the "74528" actually contains the "drugA-drugB" and the "drugB-drugA"?Can we understand that the "drugA-drugB" and the "drugB-drugA" are actually the same event with same label?
The substructure dimension when I run the code is 583 instead of the 881 in your paper. Is something wrong?