About the number of interactions and the substructure dimension

YifanDengWHU / DDIMDL

65 stars 16 forks source link

About the number of interactions and the substructure dimension #2

Closed SakuraRiven closed 4 years ago

SakuraRiven commented 4 years ago

Hi, thanks for your kind reply. I have two questions:

The number of interactions in your paper is 74528 while the data in the code is 37264. I guess the "74528" actually contains the "drugA-drugB" and the "drugB-drugA"？Can we understand that the "drugA-drugB" and the "drugB-drugA" are actually the same event with same label?
The substructure dimension when I run the code is 583 instead of the 881 in your paper. Is something wrong?

YifanDengWHU commented 4 years ago

Hi. For question 1: Yes, it's right. In fact, drugA-drugB is the same as drugB-drugA in this project, we just delete half of interactions to reduce the replication. However, if you try to use Knowledge Graph, the order will make a difference. It means (drugA,relation,drugB) is different from (drugB,relation,drugA). We need to determine the order with dependency relationship. For question 2: The original fingerprint is 881 dimension. We perform encoding again toward the results of the 881 dimension again so it results in 583. You can have a look on the table "drug" and column "smile". The maximum number is 881.

SakuraRiven commented 4 years ago

If we want to reproduce the results in your paper or develop our own method, should the dataset keep 37264 in this repo or aug to 74528? But the latter may cause a different split and test set ...

YifanDengWHU commented 4 years ago

I think you should keep 37264, because it may be easier for the model to predict (drugB,drugA) if (drugA,drugB) already exists in the training set. This will lead to false high accuracy.

SakuraRiven commented 4 years ago

OK, understand. Thanks for your answer~