Closed arslan9732 closed 1 year ago
The error message is telling you that your fasta file contains at least two sequences with the name "AABF01000026.1/2165-2294". Names need to be unique in order for esl-sfetch to be able to index/search the file.
But I am using the Rfam.fa file from the current release of Rfam. So is Rfam database contains redundant names of the sequences? If yes then there will be the possibility of redundancy with more than one.
Ah - I'll give another quick response, but maybe Eddy lab folks will have a different take:
I can't be sure what's in the file you're trying to search (Rfam.fa).
In any case, it looks like some rna sequences appear in multiple Rfam families. For example, "AF311056.1/10510-10592 " is found in both RF03536 and RF03547. That seems undesirable to me ... but maybe there's some reason that makes sense to Rfam developers? What this means is that there will be some sequences that appear more than once in the .seed file, or in a file made by concatenating all Rfam .fasta files. The result: esl-sfetch can't deal with them, because not all sequences in the file have a unique name.
I don't know your use case, so I'm not sure exactly what steps you should take ... but I do know that you'll need to somehow remove replicates if you're going to use indexing/search tools on the sequence set.
Looks like those families should either be merged, or in a clan (probably merged based on the names). In other news, is the Rfam website super slow for anyone else, or is that just because I'm on the other side of the planet?
Tagging @blakesweeney and @emmaco from @RfamDB as they are best placed to comment about duplicates in Rfam.fa.
Hi, thank you for reaching out about this. We are currently regenerating the Rfam.fa file without any duplicates and will let you know once it is ready.
@AntonPetrov @blakesweeney Thank you. I will be waiting.
Hi there! The updated file is available now http://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/fasta_files/Rfam.fa.gz Please let me know if you have any further issues.
Yeah, it works. Thank you.
Just a quick update, we noticed the Rfam.fa file was incorrect after deduplicating but have since fixed it. Please use the latest version.
Hi, I'm trying to make an index of Rfam.fa file using the following command:
esl-sfetch --index Rfam.fa
but I got this error:
Can you please help me to resolve this issue?