Structural tokenizer (`PdbQuantizer`) is too slow at processing long proteins

ai4protein / ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention.

GNU General Public License v3.0

32 stars 2 forks source link

Structural tokenizer (`PdbQuantizer`) is too slow at processing long proteins #3

Open KatarinaYuan opened 4 weeks ago

KatarinaYuan commented 4 weeks ago

Hi Teams, Thanks for the great work. Wondering how long it takes to process a protein of length 400 for your pre-trained PdbQuantizer? On my machine, somehow it's super slow. Just trying to figure out the reason.

Thanks for help!

ginnm commented 3 weeks ago

Hi, thanks for concerning our work. We will release an accelerated PdbQuantizer with multi-thread parallel processing next month.

ginnm commented 3 weeks ago

Accelerated version speed:

Protein Name (Uniprot_ID)	Length (Local structures)	Splitting to local structure	Encoding
CCDB_ECOLI_Adkar_2012	101	0.29s	4.43s
ESTA_BACSU_Nutschel_2020	212	0.67s	4.27s
PTEN_HUMAN_Matreyek_2021	403	1.06s	4.45s
ENV_HV1B9_DuenasDecamp_2016	853	3.24s	5.63s

KatarinaYuan commented 1 week ago

Thanks! Looking forward to the release!