churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

"Patched 0 SNPs total" #19

Open hanbinlu opened 5 years ago

hanbinlu commented 5 years ago

Hi,

I use the strain variant data from mouse genome project and follow the document to patch and transform the reference mm10 genome to get CAST_EiJ genome. The log did not report any error but I could not get SNPs or indels at the end. Here is my commands

g2gtools vcf2vci  -s CAST_EiJ -o mgp.v6.vci  -p 35 -i strain_variants/mgp.v6.merged.norm.snp.indels.sfiltered.vcf.gz -f mm10.fa
g2gtools patch -c mgp.v6.vci.gz  -p 35  -i  mm10.fa -o mm10_patched.CAST.fa

Log [g2gtools] Processing chr4... [g2gtools] Processing chr7... [g2gtools] Processing chr14... [g2gtools] Processing chr10... [g2gtools] Processing chr1... [g2gtools] Processing chrX... [g2gtools] Patched 0 SNPs total [g2gtools] Patch complete: 00:01:26.50

The vcf file should be no problem since I used it to patch individual regions using bcftools before.

I really appreciate your help.

kbchoi-jax commented 5 years ago

It could be just error in the log. Could you open the vci file in text editor and look into it please?

hanbinlu commented 5 years ago

Hi,

The full log for vcf2vci

[g2gtools] VCF file: /Extension_HDD1/strain_variants/mgp.v6.merged.norm.snp.indels.sfiltered.vcf.gz [g2gtools] Checking for index file, creating if needed... [g2gtools] Fasta File: /Extension_HDD1/mgp.v6.vci [g2gtools] Strain: CAST_EiJ [g2gtools] Pass filter on: False [g2gtools] Quality filter on: False [g2gtools] Diploid: False [g2gtools] Number of processes: 35 [g2gtools] Output VCI File: /Extension_HDD1/mgp.v6.vci [g2gtools] Parsing VCF files... [g2gtools] Processing Chromosome 1... [g2gtools] Processing Chromosome 2... [g2gtools] Processing Chromosome 4... [g2gtools] Processing Chromosome 5... [g2gtools] Processing Chromosome 3... [g2gtools] Processing Chromosome 7... [g2gtools] Processing Chromosome 6... [g2gtools] Processing Chromosome 8... [g2gtools] Processing Chromosome 11... [g2gtools] Processing Chromosome 10... [g2gtools] Processing Chromosome 9... [g2gtools] Processing Chromosome 12... [g2gtools] Processing Chromosome 13... [g2gtools] Processing Chromosome 15... [g2gtools] Processing Chromosome 16... [g2gtools] Processing Chromosome 14... [g2gtools] Processing Chromosome 17... [g2gtools] Processing Chromosome MT... [g2gtools] Processing Chromosome X... [g2gtools] Processing Chromosome 19... [g2gtools] Processing Chromosome 18... [g2gtools] Processing Chromosome Y... [g2gtools] Finalizing VCI file... [g2gtools] Parsed 90,310,977 total lines [g2gtools] VCI creation complete: 00:08:06.62

The vci seems fine to me and has 426398 (fewer than expected?)

CREATION_TIME=08/13/2019 14:37:43

INPUT_VCF=/Extension_HDD1/strain_variants/mgp.v6.merged.norm.snp.indels.sfiltered.vcf.gz

FASTA_FILE=MM10.fa

STRAIN=CAST_EiJ

VCF_KEEP=False

FILTER_PASSED=False

FILTER_QUALITY=False

DIPLOID=False

PROCESSES=35

CONTIG=chr1:195471971

CONTIG=chr10:130694993

CONTIG=chr11:122082543

CONTIG=chr12:120129022

CONTIG=chr13:120421639

CONTIG=chr14:124902244

CONTIG=chr15:104043685

CONTIG=chr16:98207768

CONTIG=chr17:94987271

CONTIG=chr18:90702639

CONTIG=chr19:61431566

CONTIG=chr2:182113224

CONTIG=chr3:160039680

CONTIG=chr4:156508116

CONTIG=chr5:151834684

CONTIG=chr6:149736546

CONTIG=chr7:145441459

CONTIG=chr8:129401213

CONTIG=chr9:124595110

CONTIG=chrM:16299

CONTIG=chrX:171031299

CONTIG=chrY:91744698

CHROM POS ANCHOR INS DEL FRAG

1 3000019 G . A 3000019 1 3000020 . T A . 1 3000023 . C A . 1 3000097 CTT TTTTTTTTTT . 80

juhomon commented 4 years ago

You seem to have different chr names in your contigs and in coordinates (chr1 vs 1).

CONTIG=chr1:195471971

1 3000019 G . A 3000019

Have you checked wether chr names of the contigs in your VCF are different format from the coordinate lines in the vcf? This helped me.