GWW / scsnv

scSNV Mapping tool for 10X Single Cell Data
MIT License
22 stars 4 forks source link

error in scsnv index #8

Closed hjistb closed 2 years ago

hjistb commented 2 years ago

Hi,

Thank you for this wonderful tool.

I encountered an error when I tried to build the index. Here is the command I used: scsnv index -g ~/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf \ -r ~/refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa scsnv_index

And it threw the following error message: [00:00:00] Loading the GTF file Error malformed GTF file: ~/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf at line: 5 number of toks = 9

The reference gtf and fa files were downloaded from 10x genomics, and the first few lines of the genes.gtf file look like this:

!genome-build GRCh38.p12

!genome-version GRCh38

!genome-date 2013-12

!genome-build-accession NCBI:GCA_000001405.27

!genebuild-last-updated 2018-01

1 havana gene 29554 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA" 1 havana transcript 29554 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5" 1 havana exon 29554 30039 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001947070"; exon_version "1"; tag "basic"; transcript_support_level "5" 1 havana exon 30564 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "2"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001922571"; exon_version "1"; tag "basic"; transcript_support_level "5" 1 havana exon 30976 31097 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000473358"; transcript_version "1"; exon_number "3"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001827679"; exon_version "1"; tag "basic"; transcript_support_level "5" 1 havana transcript 30267 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "5" 1 havana exon 30267 30667 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; exon_number "1"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001841699"; exon_version "1"; tag "basic"; transcript_support_level "5" 1 havana exon 30976 31109 . + . gene_id "ENSG00000243485"; gene_version "5"; transcript_id "ENST00000469289"; transcript_version "1"; exon_number "2"; gene_name "MIR1302-2HG"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "MIR1302-2HG-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001890064"; exon_version "1"; tag "basic"; transcript_support_level "5" 1 havana gene 34554 36081 . - . gene_id "ENSG00000237613"; gene_version "2"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA" 1 havana transcript 34554 36081 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000417324"; transcript_version "1"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-201"; transcript_source "havana"; transcript_biotype "lincRNA"; tag "basic"; transcript_support_level "1" 1 havana exon 35721 36081 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000417324"; transcript_version "1"; exon_number "1"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001656588"; exon_version "1"; tag "basic"; transcript_support_level "1" 1 havana exon 35277 35481 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000417324"; transcript_version "1"; exon_number "2"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001669267"; exon_version "1"; tag "basic"; transcript_support_level "1" 1 havana exon 34554 35174 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000417324"; transcript_version "1"; exon_number "3"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-201"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001727627"; exon_version "1"; tag "basic"; transcript_support_level "1" 1 havana transcript 35245 36073 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000461467"; transcript_version "1"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-202"; transcript_source "havana"; transcript_biotype "lincRNA"; transcript_support_level "3" 1 havana exon 35721 36073 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000461467"; transcript_version "1"; exon_number "1"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001618781"; exon_version "2"; transcript_support_level "3" 1 havana exon 35245 35481 . - . gene_id "ENSG00000237613"; gene_version "2"; transcript_id "ENST00000461467"; transcript_version "1"; exon_number "2"; gene_name "FAM138A"; gene_source "havana"; gene_biotype "lincRNA"; transcript_name "FAM138A-202"; transcript_source "havana"; transcript_biotype "lincRNA"; exon_id "ENSE00001874421"; exon_version "1"; transcript_support_level "3" 1 ensembl_havana gene 65419 71585 . + . gene_id "ENSG00000186092"; gene_version "6"; gene_name "OR4F5"; gene_source "ensembl_havana"; gene_biotype "protein_coding" 1 havana transcript 65419 71585 . + . gene_id "ENSG00000186092"; gene_version "6"; transcript_id "ENST00000641515"; transcript_version "2"; gene_name "OR4F5"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "OR4F5-202"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic"

Am I using the wrong reference files? Then which reference files should I use?

Thank you, Jing

GWW commented 2 years ago

Hi Jing,

Apologies, I had a small bug with my GTF parsing code at the end of the file. I have fixed the issue and tested it with the exact same GTF you mentioned above. If you clone the repository and recompile scSNV the problem should be fixed.

Gavin

hjistb commented 2 years ago

Hi Gavin,

Thank you for your quick reply. It works now.

Thank you, Jing