Closed Galib009 closed 1 year ago
Hello,
With the option --gbk, you can extract the features of the integrons annotated, which you may convert with any gbk parser.
Alternatively, if you're familiar with bash, you can do the following :
# get position for the integron integron_01 in replicon : myreplicon
positions=$(awk 'BEGIN{min=9999999; max=0}$1=="integron_01" && $2=="myreplicon"{if ($4<min) {min=$4}; if ($5>max) {max=$5}}END{print min".."max}' Results_Integron_Finder_myreplicon/*integrons)
# extract sequence with esl-sfetch (gets installed with hmmer, so you should have it)
# first create index for fasta file (need to be done once)
esl-sfetch --index myreplicon.fst
# extract sequence
esl-sfetch -c $positions myreplicon.fst myreplicon > integron_01_myreplicon.fst
Hi Jean, Thank you for your reply. However, I am facing this error after the last command "Didn't find sequence myreplicon in the index" -
here, I have attached the screenshot of the myreplicon.integrons file -
use $2=="contig_4"
in your case instead of $2=="myreplicon"
(it means "if column 2 is equal to contig_4")
You can also do it by hand and do esl-sfetch -c 1681061..1684879 myreplicon.fst contig_4 > integron_01_contig_4.fst
if you do echo $positions
it should return 1681061..1684879 (at least it's supposed to)
I want to get the full sequence of the integron including the gene cassette, is there any option or way to get the sequence from fasta/multi-fasta file?
Many Thanks for this tool