facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.24k stars 641 forks source link

High quality ESM atlas contains some pdb files without ATOM records. #405

Open khb7840 opened 1 year ago

khb7840 commented 1 year ago

Continued with issue #404, I found that some pdb files don't have ATOM records at all in high quality ESM atlas. (I haven't checked full database.) e.g., highquality_clust30_00.tar ./403/MGYP000179026403.pdb ./103/MGYP000034877103.pdb ./134/MGYP000553527134.pdb

HEADER                                            18-OCT-22                     
TITLE     ESMFOLD V0 PREDICTION FOR MGYP000553527134
REMARK   1                                                                      
REMARK   1 REFERENCE 1                                                          
REMARK   1  AUTH   ZEMING LIN, HALIL AKIN, ROSHAN RAO, BRIAN HIE, ZHONGKAI ZHU, 
REMARK   1  AUTH 2 WENTING LU, NIKITA SMETANIN, ALLAN DOS SANTOS COSTA, 
REMARK   1  AUTH 3 MARYAM FAZEL-ZARANDI, TOM SERCU, SALVATORE CANDIDO,
REMARK   1  AUTH 4 ALEXANDER RIVES                
REMARK   1  TITL   LANGUAGE MODELS OF PROTEIN SEQUENCES AT THE SCALE OF         
REMARK   1  TITL 2 EVOLUTION ENABLE ACCURATE STRUCTURE PREDICTION               
REMARK   1  REF                                                                 
REMARK   1  REFN                                                                
REMARK   1  PMID                                                                
REMARK   1  DOI    10.1101/2022.07.20.500902                                    
REMARK   1                                                                      
REMARK   1 LICENSE AND DISCLAIMERS                
REMARK   1 ESM METAGENOMIC STRUCTURE ATLAS DATA IS AVAILABLE UNDER
REMARK   1 A CC-BY-4.0 LICENSE FOR ACADEMIC AND COMMERCIAL USE.
REMARK   1 COPYRIGHT (C) META PLATFORMS, INC. ALL RIGHTS RESERVED.
REMARK   1 USE OF THE ESM METAGENOMIC STRUCTURE ATLAS DATA IS SUBJECT
REMARK   1 TO THE META OPEN SOURCE TERMS OF USE AND PRIVACY POLICY.
tomsercu commented 1 year ago

Thanks for flagging, we'll look into it. Our initial estimates indicate we have about 0.02% of records still missing.