The first one is the query IDBA_scaffold_7517 that have a hit with a protein sequence YP_009316100 in the (-) strand because the start begins by 150 and ends with 4.
The seconde one is the query IDBA_scaffold_29149 that have a hit with a protein sequence AIG51500 in the (+) strand because the start begins 2794 and ends by 2893.
so with that I actually created a bed file in order to extract the fasta sequences from my genome ( I assume that since there is a hit with a protein in the Genomes the coordinates of this hit should be in a good ORF (without stop codon).
Here is the bed_file I made :
IDBA_scaffold_7517 4 150 Species 0 -
IDBA_scaffold_29149 2794 2893 Species 0 +
Then in order to extract the sequences from the coordinates in the Genome I used:
bedtools getfasta -fi Genome.fa -bed bed_file -s -fo test
with -s to force strandedness. If the feature occupies the antisense, strand, the sequence will be reverse complemented.
But when I do that the sequences extracted from the query are not in the good ORF.
I guess it is because I should take into account the fact that the coordinates are from nuc translated to portein but how could I do that? Should I change the coordinates manually from the bed file according to the fact that it is a negative of positif strand ?
Hello I made a
blastX research
(A query genome in nucleotide format translated into protein in 6 reading frames against a protein dabatase)And here is a head of the result :
I have 2 hits:
The first one is the query
IDBA_scaffold_7517
that have a hit with a protein sequenceYP_009316100
in the(-)
strand because the start begins by150
and ends with4
.The seconde one is the query
IDBA_scaffold_29149
that have a hit with a protein sequenceAIG51500
in the(+)
strand because the startbegins 2794
andends by 2893
.so with that I actually created a bed file in order to extract the fasta sequences from my genome ( I assume that since there is a hit with a protein in the Genomes the coordinates of this hit should be in a good ORF (without stop codon).
Here is the
bed_file
I made :Then in order to extract the sequences from the coordinates in the Genome I used:
with
-s
to force strandedness. If the feature occupies the antisense, strand, the sequence will be reverse complemented.But when I do that the sequences extracted from the query are not in the good ORF. I guess it is because I should take into account the fact that the coordinates are from nuc translated to portein but how could I do that? Should I change the coordinates manually from the bed file according to the fact that it is a negative of positif strand ?
Thank you for your help.