Closed pavsol closed 3 years ago
Hi Pavel,
It looks like it's not printing the error message due to a Python2 vs. Python 3 issue which we'll need to fix. To view the actual error message, would you mind running the program directly and posting its output here? The command to do so is:
cellranger-6.0.1/lib/bin/gtf_to_gene_index home/pavsol/scRNAseq_pilot/arabidopsis/cellranger/arabidopsis_genes_lnc test.json
Thank you for a quick answer. Here it is:
$ ~/tools/cellranger-6.0.1/lib/bin/gtf_to_gene_index /home/pavsol/scRNAseq_pilot/arabidopsis/cellranger/arabidopsis_genes_lncRNA test.json
error: Duplicate Gene ID found in GTF: ATMG01275
So the issue can be incorrect ID in my GTF. I will remove it and try again.
$ >>> Reference successfully created! <<<
Simple removing those duplicated IDs solved the issue. Thank you for your help :)
Great! Glad to hear it was resolved, the next version of Cell Ranger will print the error message directly and avoid this happening again, thank you for reporting this.
Apologies for commenting on a year-old closed thread, but this is still the second Google hit for "Error detected in GTF file:" so others may find this as well.
Even with Python 3, I am finding that the error message is blank so it is impossible to figure out what is wrong. I am using Ensembl reference so supposedly GTFs should already be well-formatted.
# https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_mr
# https://useast.ensembl.org/Macaca_fascicularis/Info/Index
wget http://ftp.ensembl.org/pub/release-107/fasta/macaca_fascicularis/dna/Macaca_fascicularis.Macaca_fascicularis_6.0.dna.toplevel.fa.gz
gunzip Macaca_fascicularis.Macaca_fascicularis_6.0.dna.toplevel.fa.gz
wget http://ftp.ensembl.org/pub/release-107/gtf/macaca_fascicularis/Macaca_fascicularis.Macaca_fascicularis_6.0.107.gtf.gz
gunzip Macaca_fascicularis.Macaca_fascicularis_6.0.107.gtf.gz
# use -la cellranger
use .cellranger-7.0.0
cellranger mkgtf \
Macaca_fascicularis.Macaca_fascicularis_6.0.107.gtf Macaca_fascicularis.Macaca_fascicularis_6.0.107.filtered.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lncRNA
use .python-3.9.2
cellranger mkref \
--genome=Macaca_fascicularis_6.0 \
--fasta=Macaca_fascicularis.Macaca_fascicularis_6.0.dna.toplevel.fa \
--genes=Macaca_fascicularis.Macaca_fascicularis_6.0.107.filtered.gtf \
--ref-version=1.0.0
Output:
Creating new reference folder at /broad/prions/ono/cyno/Macaca_fascicularis_6.0
...done
Writing genome FASTA file into reference folder...
...done
Indexing genome FASTA file...
...done
Writing genes GTF file into reference folder...
...done
mkref has failed: error building reference package
Error detected in GTF file:
Any ideas what to try next? I tried using an unfiltered version of the reference and it failed just the same. Thanks in advance.
Hi,
I am having an issue with preparing reference with mkref. See the command and error message:
I am using an annotation for Arabidopsis thaliana downloaded from arabidopsis.org ( Araport11_GTF_genes_transposons.Mar202021.gtf.gz) which was further restricted to keep only CDS, exon, 5UTR, 3UTR, gene, lncRNA and mRNA:
First few lines of my GTF:
Cellranger version 6.0.1
I am obviously not the only one having this issue: https://stackoverflow.com/questions/67706086/cellranger-how-to-convert-a-gtf-file-to-string?newreg=1ed75ab3d056488eae21facdbc36035f
Any idea what is going wrong?
Thank you! Pavel