churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

Convert fails due to no mappings #15

Closed zkalender closed 5 years ago

zkalender commented 6 years ago

Hello, I have been using g2gtools to create a diploid reference genome sequence for my sample. Then I use this new reference genome to map RNA-seq data generated from this sample. vcf2vci, patch and transform works without a problem (see the output below) and I can use the custom reference genome to map my RNA-seq data on, but when I want to convert the coordinates of RNA-seq bam file from the custom reference to hg38, I get an empty bam file. Here is the command and log:

g2gtools convert \
-i MM057_RNAseq_customref.bam \
-c MM057.vci.gz \
-f bam \
 --reverse \
-o MM057_RNAseq_hg38.bam

[g2gtools] All reads processed [g2gtools] 25910636 TOTAL ENTRIES [g2gtools] 1126455 TOTAL UNMAPPED [g2gtools] 0 TOTAL FAIL QC [g2gtools] [g2gtools] Mapping Summary Single End [g2gtools] 24784181 TOTAL ENTRIES [g2gtools] [g2gtools] 0 TOTAL SUCCESS [g2gtools] 0 Simple [g2gtools] 0 Complex [g2gtools] [g2gtools] 24784181 TOTAL FAILURES [g2gtools] 24784181 Cannot Map [g2gtools] BAM File Converted

Here is the output when I run convert with -d:

[g2gtools debug] ~~~~~~~~~~~~~~~~ [27/1990] [g2gtools debug] Converting NB501171:224:HTLG5BGX5:1:11104:17500:18852 chr13_L 36030866 25M [g2gtools debug] SINGLE END ALIGNMENT [g2gtools debug] Chromosome chr13_L not found in mapping tree [g2gtools debug] Available chromsomes are: [] [g2gtools debug] Fail due to no mappings [g2gtools debug] ~~~~~~~~~~~~~~~~ [g2gtools debug] Converting NB501171:224:HTLG5BGX5:1:11104:22636:18843 chr5_R 113698848 25M [g2gtools debug] SINGLE END ALIGNMENT [g2gtools debug] Chromosome chr5_R not found in mapping tree [g2gtools debug] Available chromsomes are: [] [g2gtools debug] Fail due to no mappings [g2gtools debug] ~~~~~~~~~~~~~~~~ [g2gtools debug] Converting NB501171:224:HTLG5BGX5:1:11104:12546:18853 chr10_L 122508651 25M [g2gtools debug] SINGLE END ALIGNMENT [g2gtools debug] Chromosome chr10_L not found in mapping tree [g2gtools debug] Available chromsomes are: [] [g2gtools debug] Fail due to no mappings

And here are the rest of my commands:

g2gtools vcf2vci \
-o MM057.vci \
-s MM057 \
--diploid \
-i MM057_phased_variants.vcf.gz \
-f /refdata-GRCh38-2.1.0/fasta/genome.fa

... [g2gtools] Parsed 3,273,296 total lines [g2gtools] VCI creation complete: 00:00:15.08

g2gtools patch \
-i /refdata-GRCh38-2.1.0/fasta/genome.fa \
-c MM057.vci.gz \
-o MM057_patched.fa 

... [g2gtools] Patched 4,005,169 SNPs total [g2gtools] Patch complete: 00:00:42.83

g2gtools transform \
-i MM057_patched.fa \
-c MM057.vci.gz \
-o MM057_ref.fa

... [g2gtools] Processed 0 SNPs total [g2gtools] Processed 930,511 insertions total [g2gtools] Processed 1,076,735 deletions total [g2gtools] Transform complete: 00:00:52.30

Here is part of the vci file:

##CONTIG=chrUn_KI270756v1:79590
##CONTIG=chrUn_KI270757v1:71251
##CONTIG=chrUn_GL000214v1:137718
##CONTIG=chrUn_KI270742v1:186739
##CONTIG=chrUn_GL000216v2:176608
##CONTIG=chrUn_GL000218v1:161147
##CONTIG=chrEBV:171823
##CONTIG=hs38d1:10560522
#CHROM  POS     ANCHOR  INS     DEL     FRAG
chr1_L  14907   .       A       G       .
chr1_L  14930   .       A       G       .
chr1_L  15211   .       T       G       .
chr1_L  39230   .       G       A       .
chr1_L  42665   .       C       T       .
chr1_L  52238   .       T       G       .
chr1_L  69897   .       T       C       .
chr1_L  109526  .       G       A       .

Let me know if I should provide additional info.

Greets, Zeynep

rantingswede373 commented 5 years ago

Was there a resolution to this issue? I'm experiencing the same problem trying to convert bam files as well. I can post additional info or a separate issue if needed, but my situation is very similar to the above questioner.

krovi137 commented 1 year ago

Hi! Checking back to see if there was any resolution to this? I'm not able to get any mappings when I try to convert my bam files. Please let me know thanks!

amkilpatrick commented 1 year ago

I'm similarly unable to get g2gtools convert to provide output when generating a diploid reference genome, following the workflow (step 4) and example script. Input in this case is a gtf file from ensembl, but the error is very similar:

[g2gtools] Chromosome 1_L not found in mapping tree
[g2gtools] Available chromsomes are: dict_keys([])
[g2gtools]      Fail due to no mappings
[g2gtools] Chromosome 1_R not found in mapping tree
[g2gtools] Available chromsomes are: dict_keys([])
[g2gtools]      Fail due to no mappings

I can post as a separate issue if needed?, but the problem seems to be the same. Thanks in advance