Closed Yijia-Xiao closed 2 years ago
The lengths of the proteins are straightforwardly found in the uniref50 database, which sequences were used as seed seqs to construct the MSAs. We used this version, quite outdated now but I would guess the length statistics won't change so much: https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2018_03/uniref/
Got it, thank you :-) @tomsercu
Hi! Thanks for the great work. I have a question regarding MSA Transformer. The paper provided the distribution of MSA depths of the training set. However, the length distribution of protein is not provided.
So I am wondering whether it is possible to disclose some statistics about the length distribution?
Thanks!
Best, Yijia Xiao