chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
109 stars 37 forks source link

gtf2revcom.pl script issue #69

Closed georgysemenov closed 1 year ago

georgysemenov commented 1 year ago

Hello! I have an issue with using the gtf2revcom.pl. Th script runs and outputs the file, but the output does not look right and when I run the snpgenie.pl on it, it causes an error.

My input is:

Chr10_1 maker CDS 12202837 12203348 . - 0 gene_id "ABHD17C"; Chr10_1 maker CDS 12180001 12180180 . - 1 gene_id "ABHD17C"; Chr10_1 maker CDS 12176667 12176886 . - 1 gene_id "ABHD17C";

The command: perl gtf2revcom.pl ABHD17C.CDC.gtf 25951 # the number is the total length of CDS

The output: Chr10_1 maker CDS -12177396 -12176885 . + 0 gene_id "ABHD17C"; Chr10_1 maker CDS -12154228 -12154049 . + 1 gene_id "ABHD17C"; Chr10_1 maker CDS -12150934 -12150715 . + 1 gene_id "ABHD17C";

Thank you in advance and all the best, Georgy

singing-scientist commented 1 year ago

Greetings @georgysemenov! Thanks very much for raising this issue. The second argument to gtf2revcom must actually be the length of the full chromosome (full reference sequence; FASTA), not the length of any one gene. For example, if the reference sequence is 13 Mbp, you'd use 13000000. Let me know if that makes sense!

Chase

georgysemenov commented 1 year ago

Hi Chase! Thank you so much for your help. This was the issue indeed, now it is working as it supposed