CGATOxford / cgat

Do not use - please refer to our newest code: https://github.com/cgat-developers/cgat-apps
BSD 3-Clause "New" or "Revised" License
124 stars 66 forks source link

gtf2gtf - not selecting longest transcript? #293

Open ejduncan opened 8 years ago

ejduncan commented 8 years ago

I am wanting to extract the longest transcript for each gene from a gtf file (or gff3 file). I have installed cgat gtf2gtf and have tried using various parameters to do this using Drosophila melanogaster r6.12.gtf. It pulls out a single transcript for each gene, but not necessarily the longest transcript (e.g. ocm-RB is selected, yet it is shorter than ocm-RA and GlyS-RA is selected when it is shorter than GlyS-RB).

I was just wondering if anyone else has had problems like this and could give me some advice on how to solve?

Thanks in advance! Liz

Acribbs commented 7 years ago

Sorry for not replying, it seems as though your issue was missed over a year ago! Did you manage to solve your issue?

ejduncan commented 7 years ago

I didn't manage to solve this unfortunately and I am just about to do another (quite large) set of analyses. Any help or advice would be greatly appreciated! Thanks.

Acribbs commented 7 years ago

Are you able to provide s few lines of example input, the command you used and the output so I can help recreate and understand your issue. Thanks

AndreasHeger commented 7 years ago

Hi @ejduncan and @Acribbs . I think there were two issue. The length calculation did not take into only exons, but also any other annotations. This was a bug and is now fixed, 'length' is now only counted based on the --exon feature.

Also, "longest-transcript" is unfortunately a bit ambiguous. Longest transcript here is the one with the longest "transcript-length", which might not be the one with the longest genomic span. I have added more options to make this clearer, --filter-method can now be longest-transcript-genomic-span, longest-transcript-transcript-length, and longest-transcript-exon-count.

I now get: transcript-length: ocm-RA, GlyS-RB genomic-span: ocm-RB, GlyS-RC Hope this is better.

Acribbs commented 7 years ago

@AndreasHeger Thanks for the explanation.

Acribbs commented 7 years ago

@AndreasHeger @ejduncan can this issue be closed?

AndreasHeger commented 7 years ago

Ok for me, but would be good to know if it now behaves as expected for @ejduncan

ejduncan commented 7 years ago

Hi, sorry I haven’t had a chance to try it yet – but will do ASAP.

From: Andreas Heger [mailto:notifications@github.com] Sent: 24 November 2017 13:06 To: CGATOxford/cgat cgat@noreply.github.com Cc: Elizabeth Duncan E.J.Duncan@leeds.ac.uk; Mention mention@noreply.github.com Subject: Re: [CGATOxford/cgat] gtf2gtf - not selecting longest transcript? (#293)

Ok for me, but would be good to know if it now behaves as expected for @ejduncanhttps://github.com/ejduncan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CGATOxford/cgat/issues/293#issuecomment-346824657, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWJMRqqvyomB_6tPnQjpoFI_kszeD-L0ks5s5r8pgaJpZM4KnfHE.