HUBioDataLab / SELFormer

SELFormer: Molecular Representation Learning via SELFIES Language Models
78 stars 14 forks source link

Availability of the dataset and corresponding targets used in drug discovery experiments (Figure 4). #11

Closed Park-ing-lot closed 5 months ago

Park-ing-lot commented 6 months ago

Hi!

Following the experimental method shown in the SELFormer paper (Figure 4), we would like to analyze the drug discovery capabilities of various models.

However, the ChemBL dataset and their targets(i.e., transferases, proteases, oxidoreductases, membrane-receptors, and ion-channels) used in the Selformer paper seems not currently public, so I'm asking if you can share it with us.

Thanks, Park.

tuncadogan commented 5 months ago

Hi,

Sorry for the late response. We have another repo for constructing and splitting those protein family-specific datasets and their small molecule ligands. Please take a look at https://github.com/HUBioDataLab/ProtBENCH

The direct link to the dataset download is this: https://drive.google.com/file/d/1zVOyFIEOo33yeF3vFE8paz5pS5H5Z99N/ and information is provided in the readme file of the repo. Please let us know if you have further questions.

Park-ing-lot commented 5 months ago

Thank you!