TobiTekath / DTUrtle

Perform differential transcript usage (DTU) analysis of bulk or single-cell RNA-seq data. See documentation at:
https://tobitekath.github.io/DTUrtle
GNU General Public License v3.0
17 stars 3 forks source link

import_gtf doesn't work #9

Closed selmanurk closed 1 year ago

selmanurk commented 1 year ago

Describe the bug A clear and concise description of what the bug is. Hi, no matter how I try, I cant seem to load my gtf file to gtf. To Reproduce Steps to reproduce the behavior: gtf <- import_gtf('./oy.gff3')

The object gtf actually gets created, however there is no data in the table. Please complete the following information:

Thanks!

TobiTekath commented 1 year ago

Hi @selmanurk , I am very sorry that you are experiencing difficulties.

The described behaviour rather looks to be specific for the GFF file you are trying to use.

Without having the file you are trying to load at hand, my possibilites to investigate are somewhat limited, but you may try this workaround:

Test if gtf <- import_gtf('./oy.gff3', feature_type = NULL) returns a non-empty data frame for you. You may need to subset the resulting data to only transcript-isoform coordinates by yourself afterwards, though.

Looking forward to hear if the workaround did the trick.

Best, Tobias

selmanurk commented 1 year ago

Hi Tobias, Thank you so much for your reply. I thought I worked around this by using as a dataframe, however, I keep having problems. Here is the format of the gff3 file I am using. If it's not too much trouble, could you tell me if it's impossible for it to work? It's not from gencode.

Thank you, Selma `medtr.R108.gnmHiC1.chr1-_MtrunR108HiC004971 BioFileConverter gene 1 4309 . + ."Name=MtrunR108HiC004971;Dbxref=InterPro:IPR000315%2CPANTHER:PTHR31717%2CPANTHER..." medtr.R108.gnmHiC1.chr1-_MtrunR108HiC004971 BioFileConverter exon 1 46 . + . Name=Exon1 medtr.R108.gnmHiC1.chr1-_MtrunR108HiC004971 BioFileConverter intron 47 1181 . + . Name=Intron1

medtr.R108.gnmHiC1.chr1-_MtrunR108HiC004971 BioFileConverter intron 3415 4075 . + . Name=Intron5 medtr.R108.gnmHiC1.chr1-_MtrunR108HiC004971 BioFileConverter exon 4076 4309 . + . Name=Exon6`

TobiTekath commented 1 year ago

Hi Selma,

please excuse the long response time.

Thank you for the GFF-snippet, one thing I am wondering: Are there actual transcript isoforms defined in your GFF? I can only spot gene, exon and intron definitions. We need the actual definition of transcripts for the DTU analysis.

Here is a snipped from the ensembl GTF/GFF definition side, how that could look like:

1   transcribed_unprocessed_pseudogene  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; 
1   processed_transcript    transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; gene_sourc e "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";

If in your GFF the keyword for the transcript isoforms is not "transcript", just set the appropriate name in the import_gtf() function:

gtf <- import_gtf('./oy.gff3', feature_type = "different_keyword_here")

I hope this resolves your problems.

Best, Tobias