cole-trapnell-lab / cufflinks

Boost Software License 1.0
312 stars 116 forks source link

cufflinks gffread converts gff to gtf bug: missing specific transcript and gene name (gene0) #109

Open yuwenjiesail opened 5 years ago

yuwenjiesail commented 5 years ago

In the terminal of ubuntu 18.04, converting RefSeq.Sscrofa.gff to Sscrofa.gtf: sudo apt-get cufflinks gffread RefSeq.Sscrofa.gff -T -o Sscrofa.gtf cat Sscrofa.gtf | less NC_010443.5 Gnomon exon 5669 5760 . - . transcript_id "rna4"; gene_id "gene1"; gene_name "TBP"; As you may see above, transcript_id and gene_id are not specified. Note that, the original gff or gff3 file have transcript_id and gene_id: NC_010443.5 Gnomon exon 5669 5760 . - . ID=id18;Parent=rna3;Dbxref=GeneID:110259740,Genbank:XM_021085483.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: added 361 bases not found in genome assembly;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=TBP;inference=similar to RNA sequence (same species):INSD:GFLN01045121.1;partial=true;product=TATA-box binding protein%2C transcript variant X1;start_range=.,5669;transcript_id=XM_021085483.1

I am generating a Sus_scrofa based reference for cellranger 10x 3' RNAseq analysis. Since the ensembl-based annotation is not very good (missing 3'UTR for some genes), I am trying to use Refseq_Sscrofa reference (.fa; .gff files). Because cellranger does not process RefSeq based gff file, I have to convert gff file to gtf file. Thank you for help