10XGenomics / cellranger

10x Genomics Single Cell Analysis
https://www.10xgenomics.com/support/software/cell-ranger
Other
340 stars 91 forks source link

cellranger-arc gtf file requirments #180

Closed thugib closed 8 months ago

thugib commented 2 years ago

I am trying to get an arabidopsis gtf file to work with cellranger-arc mkref. I am using cellranger-arc mkgtf with a gtf file that I made from using agat ( agat_convert_sp_gff2gtf.pl ). This perl script from NBISweden converts GTF/GFF file into a proper GTF file; a GTF level 2 file.

When I run cellranger-arc mkgtf on this agat treated gtf, I get an error message : Writing new genes GTF file (may take 10 minutes for a 1GB input GTF file)... Traceback (most recent call last): File "/Array/bin/cellranger-arc-2.0.1/bin/rna/mkgtf", line 74, in main() File "/Array/bin/cellranger-arc-2.0.1/bin/rna/mkgtf", line 70, in main gtf_builder.build_gtf() File "/Array/bin/cellranger-arc-2.0.1/lib/python/cellranger/reference.py", line 465, in build_gtf for row, is_comment, properties in self.gtf_reader_iter(self.in_gtf_fn): File "/Array/bin/cellranger-arc-2.0.1/lib/python/cellranger/reference.py", line 192, in gtf_reader_iter self.validate_transcript_id(filename, properties, i, row) File "/Array/bin/cellranger-arc-2.0.1/lib/python/cellranger/reference.py", line 230, in validate_transcript_id % (i + 1, "\t".join(row)), cellranger.reference.GtfParseError: Error while parsing GTF file /Array/bin/GTFagat/Araport11_GTF_genes_transposons.May2022agatm.gtf Property 'transcript_id' has invalid character ';' in GTF line 90: ChrM Araport11 transcript 124603 124676 . - . gene_id "ATMG01375:exon:1"; transcript_id "ATMG01375.1;Name=ATMG01375.1;id2=exon-id-trnH(GTG)-1;parent2=id-trnH(GTG)"; ID "ATMG01375.1

Please fix your GTF and start again.

So, it gets to line 90 in the gtf file and stops because of a ';', but there have been many semi-colons in the previous lines.

The first few lines of my agat treated gtf file look like this:

gtf-version 3

ChrM Araport11 gene 6264 8389 . + . gene_id "ATMG00160"; transcript_id "ATMG00160"; ID "ATMG00160"; ChrM Araport11 transcript 6264 8389 . + . gene_id "ATMG00160"; transcript_id "ATMG00160.1"; ID "ATMG00160.1"; Parent "ATMG00160"; original_biotype "mrna"; ChrM Araport11 exon 6264 6963 . + . gene_id "ATMG00160"; transcript_id "ATMG00160.1"; ID "exon-321264"; Parent "ATMG00160.1"; ChrM Araport11 exon 8307 8389 . + . gene_id "ATMG00160"; transcript_id "ATMG00160.1"; ID "exon-321265"; Parent "ATMG00160.1"; ChrM Araport11 CDS 6264 6963 . + 0 gene_id "ATMG00160"; transcript_id "ATMG00160.1"; ID "cds-286221"; Parent "ATMG00160.1";

Any suggestions as to what is going wrong ?