Closed Valentin-Bio closed 1 year ago
There aren't yet any known proteins that long, to my knowledge. My guess is that you're translating your genome into six very long strings, with *
or some such for stop codons, instead of into individual ORFs. HMMER expects proteins to be individual proteins, not concatenated whole genomes and not containing non-amino-acid characters. The solution is to translate into individual ORFs. One way to do this is with the esl-translate
tool included with HMMER.
Yes, my bad, I forgot to retrieve the ORFs from the genome, thanks for the clarification.
Hello I'm trying to search hydrocarbon-degrading-genes over my metagenome assembled genomes (MAGs) using a hmm profile.
I'm testing the profile of the genes (my_genes.hmm) with one of my built genomes (genome1.fasta)
For this purpose, I first converted the nucleotide sequences to aminoacidic with transeq program:
transeq -sequence genome1.fasta -outseq genome1.faa
after that, I ran hmmsearch as the following:
hmmsearch --cpu 9 --tblout hydrocarbon_table.txt my_genes.hmm genome1.faa> hydrocarbon_results.txt
But I'm getting the title mentioned error message:
Given the fact that the input file contain aminoacidic sequences longer than 100K , how can I deal with this problem ?
A solution to this problem. (I don't know if it is the optimal solution) is to use the
hmmemit
program to retrieve all the sequences from the profile and then run hmmscan of the retrieved aminoacidic sequences against the aminoacidic sequences of my MAG.