Bitbol-Lab / Iterative_masking

Iterative masking algorithm on MSA Transformer to generate synthetic sequences
https://elifesciences.org/articles/79854
Apache License 2.0
21 stars 7 forks source link

How to download the aligned fasta files from Pfam database? #1

Closed psp3dcg closed 6 months ago

psp3dcg commented 7 months ago

Thanks for the really nice package! However, when I want to test the package on more Pfam data, the downloaded fasta file from InterPro (https://www.ebi.ac.uk/interpro/entry/pfam/#table) is not aligned. For example (PF00020)

A0A060W133|unreviewed|Tumor necrosis factor receptor superfamily member 6|taxID:8022 MNKYTFLYILCILCTVRLTTPFNAERSSQDILITSKLRTKRQSCQDGTYQHEGMACCLCAAGQHLESHCSVSPEDGTCVY CEENRTYNSDPNSLDSCEPCTSCDSKANLEVEDRCTIFKDSVCRCQQGHYCNKGKEHCRACYPCTICSEEGIKVACSATN NTICHAFKEQGRNLAVVFVLTTVLLVLLVIIYLWRSNKYCFGPNGGLTELPNRSSEEMQPLRGVNLWPHLPDIAKTLGWR DMKQVAECSGMTHTAIESHQLNFPNDSQEQCSSLLRAWVEKEGMTTASVTLVQTLLRMKKKVKAEDIMAIISNKEDGVTG QNSGSGQV A0A060W225|unreviewed|TNFR-Cys domain-containing protein|taxID:8022 MFDKSMSNIGLHYMVVLLIWALNPMVAAQSGLKLTRTGGSVRNLTQRDISCQENLEYPHDNICCLNCLAGTYVKEYCTRA LERGTCEACEFDTYTEHGNGLRQCLKCTTCHSDQVTTKACTITQDRECRCKPGSFCAPDQACEVCKKCLRCEENEVRLKN CTATSNTVCKTRLPAPSTIPGTRPGTADIPLLHALLTPVYYYGLGFYCVLTQ

So how to get the aligned fasta like the example in your package? Thank U~

damiano-sg commented 6 months ago

Hi, sorry for the delay. From InterPro it's possible to download also the full alignment if you need it. Otherwise you can align it with any HMM of your choice and it should work