Code and models are missing

chao1224 / ProteinDT

https://chao1224.github.io/ProteinDT

MIT License

35 stars 4 forks source link

Code and models are missing #1

Closed ddofer closed 4 days ago

ddofer commented 10 months ago

Hi, I read the paper - but I see the repo is empty of code or models. Notably, I want to see if your textual pretraining data filtered out cases that appear (or are similar, by BLAST or the like) to any in the TAPE eval set. (e.g. like we did in ProteinBERT https://github.com/nadavbra/protein_bert )

chao1224 commented 7 months ago

Hi @ddofer,

Thank you for the questions.

We will release the codes and models once our manuscript is officially published.
To your second question, we double-checked the SwissProtCLAP and TAPE datasets (train & eval & test), and there are no shared protein sequences.

Amelie-Schreiber commented 6 months ago

When will the paper be published and when will the code be subsequently released?

chao1224 commented 6 months ago

Hi @Amelie-Schreiber, our manuscript is now in submission. We will release the code once it is accepted. Meanwhile, you can check the latest version here.