ai4protein / ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention.
GNU General Public License v3.0
32 stars 2 forks source link

Structural tokenizer (`PdbQuantizer`) is too slow at processing long proteins #3

Open KatarinaYuan opened 4 weeks ago

KatarinaYuan commented 4 weeks ago

Hi Teams, Thanks for the great work. Wondering how long it takes to process a protein of length 400 for your pre-trained PdbQuantizer? On my machine, somehow it's super slow. Just trying to figure out the reason.

Thanks for help!

ginnm commented 3 weeks ago

Hi, thanks for concerning our work. We will release an accelerated PdbQuantizer with multi-thread parallel processing next month.

ginnm commented 3 weeks ago

Accelerated version speed:

Protein Name (Uniprot_ID) Length (Local structures) Splitting to local structure Encoding
CCDB_ECOLI_Adkar_2012 101 0.29s 4.43s
ESTA_BACSU_Nutschel_2020 212 0.67s 4.27s
PTEN_HUMAN_Matreyek_2021 403 1.06s 4.45s
ENV_HV1B9_DuenasDecamp_2016 853 3.24s 5.63s
KatarinaYuan commented 1 week ago

Thanks! Looking forward to the release!