genomeannotation / transvestigator

Validates transcriptome and prepares it for submission to the NCBI
MIT License
3 stars 1 forks source link

Partial CDSs need room to stretch out #15

Closed bruab closed 9 years ago

bruab commented 9 years ago

ATM given input like this:

c21845_g2_i1    transdecoder    exon    1       1021    .       +       .       ID=c21845_g2_i1|m.56605.exon1;Parent=c21845_g2_i1|m.56605
c21845_g2_i1    transdecoder    CDS     468     1021    .       +       .       ID=cds.c21845_g2_i1|m.56605;Parent=c21845_g2_i1|m.56605

transvestigator writes output like this:

c21845_g2_i1    transdecoder    exon    1       1021    .       +       0       ID=c21845_g2_i1|m.56605.exon1;Parent=c21845_g2_i1|m.56605

c21845_g2_i1    transdecoder    CDS     468     1019    .       +       0       ID=cds.c21845_g2_i1|m.56605;Parent=c21845_g2_i1|m.56605

Note the change in the end index of the CDS. Can't recall why we do this, but as of this date and time the change causes PartialProblem Warnings in tbl2asn. This time around the NCBI seems to care about warnings (maybe we snuck by last time)?

Anyway, I've confirmed that extending the CDS indices to the exon boundaries removes this warning. So we should do that.

smg283 commented 9 years ago

I swear they were getting pissed that we had a partial codon, so we changed it. Makes sense to read

c21845_g2_i1 transdecoder CDS 468 1021 . + . ID=cds.c21845_g2_i1|m.56605;Parent=c21845_g2_i1|m.56605

as a partial, another thing to consider is what is printed as the peptide seqeunce for the partial codon (do you skip or try to infer if possible?)

On Wed, Nov 5, 2014 at 3:56 PM, Brian Hall notifications@github.com wrote:

ATM given input like this:

c21845_g2_i1 transdecoder exon 1 1021 . + . ID=c21845_g2_i1|m.56605.exon1;Parent=c21845_g2_i1|m.56605 c21845_g2_i1 transdecoder CDS 468 1021 . + . ID=cds.c21845_g2_i1|m.56605;Parent=c21845_g2_i1|m.56605

transvestigator writes output like this:

c21845_g2_i1 transdecoder exon 1 1021 . + 0 ID=c21845_g2_i1|m.56605.exon1;Parent=c21845_g2_i1|m.56605

c21845_g2_i1 transdecoder CDS 468 1019 . + 0 ID=cds.c21845_g2_i1|m.56605;Parent=c21845_g2_i1|m.56605

Note the change in the end index of the CDS. Can't recall why we do this, but as of this date and time the change cause PartialProblem Warnings in tbl2asn. This time around the NCBI seems to care about warnings (maybe we snuck by last time)?

Anyway, I've confirmed that extending the CDS indices to the exon boundaries removes this warning. So we should do that.

— Reply to this email directly or view it on GitHub https://github.com/genomeannotation/transvestigator/issues/15.

bruab commented 9 years ago

I'll check on this. You mean "what tbl2asn prints as the peptide sequence"?

smg283 commented 9 years ago

or what we print (we now print a peptide fasta file, what does it do with a partial codon?) DOn't worry about NCBI

On Thu, Nov 6, 2014 at 11:14 AM, Brian Hall notifications@github.com wrote:

I'll check on this. You mean "what tbl2asn prints as the peptide sequence"?

— Reply to this email directly or view it on GitHub https://github.com/genomeannotation/transvestigator/issues/15#issuecomment-62014944 .

bruab commented 9 years ago

We don't do that just yet -- it's #14. What should we write for a partial? Just leave off the last amino acid?

smg283 commented 9 years ago

I would think so. simpler

On Thu, Nov 6, 2014 at 1:39 PM, Brian Hall notifications@github.com wrote:

We don't do that just yet -- it's #14 https://github.com/genomeannotation/transvestigator/issues/14. What should we write for a partial? Just leave off the last amino acid?

— Reply to this email directly or view it on GitHub https://github.com/genomeannotation/transvestigator/issues/15#issuecomment-62038232 .

bruab commented 9 years ago

Closed with 11ab5596e7205403ac4ab40183ba9fba34924dfb -- but I forgot to do the slick "close issue from commit message" trick.