chasewnelson / SNPGenie

Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
GNU General Public License v3.0
102 stars 37 forks source link

Results _ File/ Just one CDS #16

Closed StephanieRodrigues closed 5 years ago

StephanieRodrigues commented 5 years ago

Hi Chase, Its me again! I thought that my problems are solved, in fact, im not having problems to run the program, but when I started to look to my results and the output files, something seems weird. Seems that the program just recognized my first CDS, and just print the results for this. I`m attaching the results folder, the gtf, vcf and fasta file. The command line that i used was: ./snpgenie.pl --vcfformat=3 --snpreport=CL9800.vcf —fastafile=Ef_Aus0004.fasta --gtffile=Aus004.gtf

Regards,

singing-scientist commented 5 years ago

It is likely an issue with the GTF file (e.g., non-UNIX line endings). I you can provide it, I'll take a look.

StephanieRodrigues commented 5 years ago

Aus004_gtf.zip Here! I don`t know whats going on, I solve the last issue changing the CDS as you told me, then I remove the double quotes. But still wrong!

singing-scientist commented 5 years ago

I see; as discussed in the Troubleshooting of the SNPGenie documentation, the line endings in all your files must be Unix. However, the line endings here are Mac. You'll have to convert them to Unix in a program like TextWrangler (Mac) or Notepad++ (Windows). Let me know.

StephanieRodrigues commented 5 years ago

Hi Chase, Sorry for the delay! So I restart my process, I ran gffcompare again, to convert my gff file in gtf file! And now, the gtf file seems to be ok. In my last issue, the program was not recognizing + strands because the word (CDS is missing). I solved this opening my file in Excel and replacing the names, but what happening is the line ending problem. (Your last reply in this issue here). So now, after convert my gff file any of this problems happened, but know, this is the message:

The CDS coordinates for gene gene45 in the gtf file do not yield a set of complete codons,

or are absent from the file. The number of nucleotides must be a multiple of 3.

SNPGenie terminated.

I tried to delete all CDS lines with this problems, but I still having error. I saw other issue here that the guy had the same problem, but in his case, his GTF file had multiple transcripts for the same gene. I checked my GTF file but this not seems to be the problem. I`m attaching you the GTF file. Really sorry for a bunch of this issues, I really need to run your program. GTF_Aus0004.zip

Regards, Stephanie

singing-scientist commented 5 years ago

Hi @StephanieRodrigues ! No worries. I checked gene45 and indeed, there are (50502-49545+1)/3 = 319.3333 codons, so this gene length is not a multiple of 3. There may be multiple "CDS" lines for the same gene (e.g., protein-coding exons), but alll "CDS" for each unique gene name must sum to a multiple of 3.

First, make sure CDS records for the same gene have the same gene name. If there are still problems with some genes, then the problematic (non-multiple of 3) genes should be removed. Let me know.

singing-scientist commented 5 years ago

As this issue has been silent 26 days, I am closing it now. Please feel free to re-open if you have further issues.