egonozer / in_silico_pcr

Perl script for simulating PCR reactions. Extract sequences from a query based on primer sequences.
GNU General Public License v3.0
37 stars 12 forks source link

strand information #7

Open xguo-nveloptx opened 8 months ago

xguo-nveloptx commented 8 months ago

Hi,

First, thanks for making this available. This is more of a question rather than issue. So the coordinates given in the output are in the + strand or the - strand? So is the amplicon sequence as well? the sequence on the + strand?

Best

Xiaoyun

egonozer commented 8 months ago

The PositionInSequence value is the left-most 1-based coordinate based on the input sequence.

For example, if the sequence in your input fasta file is as follows (with our primer targets bolded):

>testseq AAAAACGTTCCCCCCCCTCAGTAAAAA

and you run perl in_silico_PCR.pl -s testseq.fasta -a ACGTT -b ACTGA, then the PositionInSequence will be 5 and the "amplicon" sequence will be ACGTTCCCCCCCCTCAGT.

If you reverse the primer sequences, i.e. perl in_silico_PCR.pl -s testseq.fasta -a ACTGA -b ACGTT, the result will be the same as above. In this case, however, if you use the -r option to output the amplicon sequence in the same orientation as the primers, then the PositionInSequence will still be 5, but the output amplicon sequence will now be reverse complemented, i.e. ACTGAGGGGGGGGAACGT.

So the coordinate value will be fixed based on the orientation of the sequences in your input file, but the output amplicon sequences can differ depending on your primer order and whether you use the -r flag or not.