In the terminal of ubuntu 18.04, converting RefSeq.Sscrofa.gff to Sscrofa.gtf:
sudo apt-get cufflinks
gffread RefSeq.Sscrofa.gff -T -o Sscrofa.gtf
cat Sscrofa.gtf | less
NC_010443.5 Gnomon exon 5669 5760 . - . transcript_id "rna4"; gene_id "gene1"; gene_name "TBP";
As you may see above, transcript_id and gene_id are not specified.
Note that, the original gff or gff3 file have transcript_id and gene_id:
NC_010443.5 Gnomon exon 5669 5760 . - . ID=id18;Parent=rna3;Dbxref=GeneID:110259740,Genbank:XM_021085483.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: added 361 bases not found in genome assembly;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=TBP;inference=similar to RNA sequence (same species):INSD:GFLN01045121.1;partial=true;product=TATA-box binding protein%2C transcript variant X1;start_range=.,5669;transcript_id=XM_021085483.1
I am generating a Sus_scrofa based reference for cellranger 10x 3' RNAseq analysis. Since the ensembl-based annotation is not very good (missing 3'UTR for some genes), I am trying to use Refseq_Sscrofa reference (.fa; .gff files). Because cellranger does not process RefSeq based gff file, I have to convert gff file to gtf file.
Thank you for help
In the terminal of ubuntu 18.04, converting RefSeq.Sscrofa.gff to Sscrofa.gtf: sudo apt-get cufflinks gffread RefSeq.Sscrofa.gff -T -o Sscrofa.gtf cat Sscrofa.gtf | less NC_010443.5 Gnomon exon 5669 5760 . - . transcript_id "rna4"; gene_id "gene1"; gene_name "TBP"; As you may see above, transcript_id and gene_id are not specified. Note that, the original gff or gff3 file have transcript_id and gene_id: NC_010443.5 Gnomon exon 5669 5760 . - . ID=id18;Parent=rna3;Dbxref=GeneID:110259740,Genbank:XM_021085483.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: added 361 bases not found in genome assembly;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=TBP;inference=similar to RNA sequence (same species):INSD:GFLN01045121.1;partial=true;product=TATA-box binding protein%2C transcript variant X1;start_range=.,5669;transcript_id=XM_021085483.1
I am generating a Sus_scrofa based reference for cellranger 10x 3' RNAseq analysis. Since the ensembl-based annotation is not very good (missing 3'UTR for some genes), I am trying to use Refseq_Sscrofa reference (.fa; .gff files). Because cellranger does not process RefSeq based gff file, I have to convert gff file to gtf file. Thank you for help