getzlab / rnaseqc

Fast, efficient RNA-Seq metrics for quality control and process optimization
Other
146 stars 19 forks source link

collapse_annotation.py cannot process the gtf file generated by gffread #71

Open biozzq opened 2 years ago

biozzq commented 2 years ago

Dear all,

To prepare the gtf file used in rnaseqc, I first converted the gff file to gtf file using following command,

gffread-0.12.7.Linux_x86_64/gffread -T -o out.gtf input.gff, however it give me error when running collapse_annotation.py out.gtf collapse.gtf

Traceback (most recent call last):
  File "collapse_annotation.py", line 294, in <module>
    annotation = Annotation(args.transcript_gtf)
  File "collapse_annotation.py", line 89, in __init__
    attributes.pop('transcript_type'), g, start_pos, end_pos)
KeyError: 'transcript_type'

Based on above error message, I added gene_biotype and transcripttype information to the end of each line. `perl -e 'while(<>){chomp; print $," gene_biotype \"protein_coding\"; transcript_biotype \"protein_coding\";\n"}' out.gtf >processed.gtf`

Finally, when running collapse_annotation.py processed.gtf collapse.gtf, another error occured.

Traceback (most recent call last):
  File "collapse_annotation.py", line 294, in <module>
    annotation = Annotation(args.transcript_gtf)
  File "collapse_annotation.py", line 89, in __init__
    attributes.pop('transcript_type'), g, start_pos, end_pos)
UnboundLocalError: local variable 'g' referenced before assignment

I attached the processed.gtf here. How should this be handled? processed.zip

Thank you in advance. Best wishes, Zheng zhuqing

francois-a commented 2 years ago

RNA-SeQC requires GTF in the format specified at https://www.gencodegenes.org/pages/data_format.html, with a gene > transcript > exon hierarchy in the feature type column (additional features like CDS etc are also supported). Your GTF is missing gene features, it only has transcripts and exonic features.

KristinaGagalova commented 10 months ago

Is there a tool to convert a gtf to the required format? I am also having issues with that. Thank you in advance