metadata dataframe content, add sequence crc64

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.25k stars 642 forks source link

Discussed in https://github.com/facebookresearch/esm/discussions/340

^{Originally posted by **igortru** November 4, 2022} please, add protein sequence crc64 column to metadata file (or any other hash , md5 for example) it allow easily map sequences from different databases : ebi-embl,alphafold,genbank,mgnify,etc. as template , you can take alphafold metainformation table in GCP. https://github.com/deepmind/alphafold/blob/main/afdb/README.md mapping file between genbank and alphafold you can find on https://ftp.ncbi.nlm.nih.gov/genomes/Viruses/AlphaFold2NR.map.gz id is the MGnify ID ptm is the predicted TM score plddt is the predicted average lddt num_conf is the number of residues with plddt > 0.7 len is the total residues in the protein crc64 from crc64iso.crc64iso import crc64

facebookresearch / esm

metadata dataframe content, add sequence crc64 #350

Discussed in https://github.com/facebookresearch/esm/discussions/340