Want to know the detailed preparation process of dataset

GFNOrg / gflownet

Generative Flow Networks

MIT License

608 stars 77 forks source link

Want to know the detailed preparation process of dataset #18

Open G1NO3 opened 8 months ago

G1NO3 commented 8 months ago

Hi I just want to use GFlowNet for another protein pocket. Now I have a dataset of SMILES and docking scores, but I'm not very sure about the rest of the preparation process of the dataset. For example, if you curate the result from the BRICS algorithm, then how you process the blocks that do not emerge in the block dictionary? And do you have a script for the generation of "jbonds" and "stem_idx"? I'd appreciate it if you could provide some! Thanks!

bengioe commented 8 months ago

As per @MKorablyov's answer in #9, this involved some manual intervention from a chemist, I'm afraid the details are lost to time, but I'll try digging...

G1NO3 commented 8 months ago

Thanks! I've followed the steps in #9 and get a block dictionary myself. But the next question is, how can we determine the block_r? It seems that it's not merely the "reaction site" of a block. For example, mol 9 and 10 are both pyridine but the block_r[9] is [0,1,2,4,5] and block_r[10] is [1,0,2,4,5]. Does the order matter?