Closed lunalvan closed 1 year ago
Thank you for the reporting the issue. A bug fix is done for the VCF export. It should export calls for all the chromosomes.
[W::bcf_update_info] INFO/END=0 is smaller than POS at chrUn_GL000218v1:1
It's a warning message and not causing any problems for the VCF export. I compared it with the print calls
. We are checking the cause of the warning message.
Please let me know whether the export working for you properly or not.
-Arijit
Hi! It is not exporting the VCF correctly, but the other formats (TSV, XSLX) are fine. This is what I did:
Went through all of the steps to call CNVs from a BAM file, then entered view mode (the output of 'print calls' written here is only a part of the full one I get, as it is way too long; anyways, it shows CNVs from all chromosomes)
cnvpytor -root input.pytor -rd file.bam
cnvpytor -root input.pytor -his 500
cnvpytor -root input.pytor -partition 500
cnvpytor -root input.pytor -call 500
cnvpytor -root input.pytor -view 500
cnvpytor> print calls
deletion chr15:17416501-17429500 13000 0.6160 6.907563e-03 5.306594e+08 9.268198e-03 1.582461e+07 0.8849 0.0000 322500
deletion chr15:17430501-17432500 2000 0.3927 1.304437e+03 8.317632e-05 1.000000e+00 1.000000e+00 1.0000 0.0000 319500
deletion chr15:17434001-17435500 1500 0.3999 6.452230e+03 8.242455e-04 1.000000e+00 1.000000e+00 1.0000 0.0000 316500
deletion chr15:17447501-17458000 10500 0.5473 1.153577e-03 3.370347e+08 1.694639e-01 5.068574e+08 0.8953 0.0000 294000
deletion chr15:17460501-17463500 3000 0.3499 8.689704e+01 1.415738e-11 1.114358e+05 4.886709e+02 0.8685 0.0000 288500
cnvpytor> set print_filename calls.tsv
* print_filename: calls.tsv
cnvpytor> print calls
cnvpytor> quit
When opening the TSV file, all chromosomes are present:
FG014001 rd_mean_shift deletion chr1 1.0 10000.0 10000.0 0.0 0.0 1.035158820320922e-283 0.0 3.1886894735586106e-225 -1.0 1.0 0.0
FG014001 rd_mean_shift deletion chr1 77001.0 93000.0 16000.0 0.5861244203691571 5.17960674351059e-10 2.831344821281673 2.030861193499537e-08 37.82228977146194 0.26245505906522854 0.0 67100.0
FG014001 rd_mean_shift deletion chr1 139501.0 155000.0 15500.0 0.5841276440529031 2.4409700067680674e-08 454913.0033028214 5.816907918306432e-06 1406903.9842667007 0.933991683991684 0.0 52800.0
As previously said, when I try to do the same thing for a VCF file, the following message appears:
cnvpytor> set print_filename calls.vcf
* print_filename: calls.vcf
cnvpytor> print calls
[W::bcf_update_info] INFO/END=0 is smaller than POS at chrUn_GL000218v1:1
cnvpytor> quit
When looking into the generated VCF, there are a lot of calls, but all of them belong to the 'chrUn_GL000218v1:1' chromosome that was mentioned in the warning message.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FG014001
chrUn_GL000218v1 2 CNVpytor_del0 . <DEL> . PASS END=10000;IMPRECISE;SVLEN=10000;SVTYPE=DEL;pytorRD=0;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=-1;pytorPN=1;pytorDG=0;pytorCL=rd_mean_shift GT:CN 1/1:0
chrUn_GL000218v1 77001 CNVpytor_del1 . <DEL> . PASS END=93000;IMPRECISE;SVLEN=16000;SVTYPE=DEL;pytorRD=0.586124;pytorP1=5.17961e-10;pytorP2=2.83134;pytorP3=2.03086e-08;pytorP4=37.8223;pytorQ0=0.262455;pytorPN=0;pytorDG=67100;pytorCL=rd_mean_shift GT:CN 0/1:0
chrUn_GL000218v1 139501 CNVpytor_del2 . <DEL> . PASS END=155000;IMPRECISE;SVLEN=15500;SVTYPE=DEL;pytorRD=0.584128;pytorP1=2.44097e-08;pytorP2=454913;pytorP3=5.81691e-06;pytorP4=1.4069e+06;pytorQ0=0.933992;pytorPN=0;pytorDG=52800;pytorCL=rd_mean_shift GT:CN 0/1:0
chrUn_GL000218v1 160001 CNVpytor_del3 . <DEL> . PASS END=177000;IMPRECISE;SVLEN=17000;SVTYPE=DEL;pytorRD=0.653613;pytorP1=5.67299e-06;pytorP2=1.69651e+08;pytorP3=0.00033837;pytorP4=2.36637e+08;pytorQ0=0.963744;pytorPN=0;pytorDG=30800;pytorCL=rd_mean_shift GT:CN 0/1:0
chrUn_GL000218v1 207501 CNVpytor_del4 . <DEL> . PASS END=257500;IMPRECISE;SVLEN=50000;SVTYPE=DEL;pytorRD=0.000890327;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=0.777778;pytorPN=0;pytorDG=0;pytorCL=rd_mean_shift GT:CN 1/1:0
It may be worth noting that, when running 'print calls' with the view mode, the last three calls are:
duplication chrUn_GL000218v1:1-80000 80000 2.1118 0.000000e+00 2.081059e-14 0.000000e+00 7.884573e-14 0.0631 0.0000 100
duplication chrUn_GL000218v1:81501-98500 17000 1.8095 3.543695e-09 7.375459e-34 3.803691e-09 9.862915e-42 0.1515 0.0000 62800
duplication chrUn_GL000218v1:99501-161500 62000 1.9478 0.000000e+00 2.358886e+09 0.000000e+00 2.822972e-184 0.1595 0.0057 100
Which are the only calls that belong to the 'chrUn_GL000218v1:1' chromosome in the original calls.
I also tried to run the full thing only with chromosomes 1-22, X, Y and MT. The same error appeared, but this time regarding chromosome 22. The VCF file only showed calls from chr22.
I hope this information was helpful! Thanks!
Currently the update is available to latest development version only. Did you install the latest development version? if not, command to install
pip install git+https://github.com/abyzovlab/CNVpytor.git
It will be added to our upcoming release.
[W::bcf_update_info] INFO/END=0 is smaller than POS at chrUn_GL000218v1:1
The above warning message will be addressed by the upcoming pysam release and not causing any issue. Ref pysam-developers/pysam/issues/1175
Hi, I have installed the latest development version, but it still does not work.
I thought about looking at the VCF and TSV files that are generated from the same run and I spotted something.
These are the first lines in the VCF file that is generated
chrUn_GL000218v1 77001 CNVpytor_del0 . <DEL> . PASS END=93000;IMPRECISE;SVLEN=16000;SVTYPE=DEL;pytorRD=0.586124;pytorP1=5.17961e-10;pytorP2=2.83134;pytorP3=2.03086e-08;pytorP4=37.8223;pytorQ0=0.262455;pytorPN=0;pytorDG=67100;pytorCL=rd_mean_shift GT:CN 0/1:0
chrUn_GL000218v1 276001 CNVpytor_dup1 . <DUP> . PASS END=289000;IMPRECISE;SVLEN=13000;SVTYPE=DUP;pytorRD=1.77527;pytorP1=1.6296e-06;pytorP2=1.74525e-25;pytorP3=0.000132433;pytorP4=3.20541e-20;pytorQ0=0.335371;pytorPN=0;pytorDG=9100;pytorCL=rd_mean_shift GT:CN 1/1:2
chrUn_GL000218v1 297501 CNVpytor_del2 . <DEL> . PASS END=348000;IMPRECISE;SVLEN=50500;SVTYPE=DEL;pytorRD=0.00887088;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=0.131868;pytorPN=0;pytorDG=0;pytorCL=rd_mean_shift GT:CN 1/1:0
chrUn_GL000218v1 13224501 CNVpytor_dup3 . <DUP> . PASS END=13247500;IMPRECISE;SVLEN=23000;SVTYPE=DUP;pytorRD=1.97146;pytorP1=0;pytorP2=5.68356e-06;pytorP3=0;pytorP4=0.000107939;pytorQ0=0.00876018;pytorPN=0;pytorDG=220300;pytorCL=rd_mean_shift GT:CN 1/1:2
And these are the ones from the TSV file
FG014001 rd_mean_shift deletion chr1 77001.0 93000.0 16000.0 0.5861244203691571 5.17960674351059e-10 2.831344821281673 2.030861193499537e-08 37.82228977146194 0.26245505906522854 0.0 67100.0
FG014001 rd_mean_shift duplication chr1 276001.0 289000.0 13000.0 1.775269007501444 1.629596227190269e-06 1.7452471084512207e-25 0.00013243259089357196 3.205413376007881e-20 0.33537099494097805 0.0 9100.0
FG014001 rd_mean_shift deletion chr1 297501.0 348000.0 50500.0 0.008870882819780327 0.0 3.08855205995407e-60 0.0 0.0 0.13186813186813187 0.9900990099009901 0.0
FG014001 rd_mean_shift duplication chr1 13224501.0 13247500.0 23000.0 1.9714585906709374 0.0 5.683558121712347e-06 0.0 0.00010793889006778407 0.008760177264763476 0.0 220300.0
They are the same calls, but for some reason, when the file is exported to VCF format, it changes the names of the chromosomes so that all of the CNVs belong to the mentioned chromosome in the warning message. I have no idea why this happens, is there any way I could fix this?
Thanks,
Luna
I am not having the issue and created a colab link for you to show. Colab link: https://colab.research.google.com/drive/1jTbnk0tKcALZKj2_dRcyoVaL0IQrPGlE?usp=sharing Here is the example calls from multiple chromosome exported in VCF format.
chr5 46400001 CNVpytor_del16 . <DEL> . PASS END=49400000;IMPRECISE;SVLEN=3000000;SVTYPE=DEL;pytorRD=0.00323983;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=0.265431;pytorPN=0;pytorDG=0;pytorCL=rd_mean_shift GT:CN 1/1:0
chr5 58700001 CNVpytor_del17 . <DEL> . PASS END=59400000;IMPRECISE;SVLEN=700000;SVTYPE=DEL;pytorRD=0.375306;pytorP1=0.386142;pytorP2=0;pytorP3=0.386142;pytorP4=0;pytorQ0=0.00611391;pytorPN=0;pytorDG=9294500;pytorCL=rd_mean_shift GT:CN 0/1:1
chr6 2 CNVpytor_del18 . <DEL> . PASS END=200000;IMPRECISE;SVLEN=200000;SVTYPE=DEL;pytorRD=0.561811;pytorP1=1205.83;pytorP2=0.00313723;pytorP3=1205.83;pytorP4=0.00313723;pytorQ0=0.827092;pytorPN=0;pytorDG=0;pytorCL=rd_mean_shift GT:CN 0/1:1
chr6 200001 CNVpytor_dup19 . <DUP> . PASS END=28500000;IMPRECISE;SVLEN=28300000;SVTYPE=DUP;pytorRD=1.44527;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=0.00767461;pytorPN=0;pytorDG=140100;pytorCL=rd_mean_shift GT:CN 0/1:3
chr6 28500001 CNVpytor_del20 . <DEL> . PASS END=33400000;IMPRECISE;SVLEN=4900000;SVTYPE=DEL;pytorRD=0.269621;pytorP1=0;pytorP2=0;pytorP3=0;pytorP4=0;pytorQ0=0.840541;pytorPN=0;pytorDG=24687800;pytorCL=rd_mean_shift GT:CN 0/1:1
Hi, I am using CNVpytor to call CNVs from WGS data, and an issue has emerged when exporting the calls in .vcf format.
After the CNV calling step, the .tsv generated file looks good, with calls from all the different chromosomes. Afterwards, I enter the interactive mode by running
After that, I want to generate both an .xlsx and a .vcf files. There is no trouble generating the excel one, it also contains all of the information. However, when doing the same thing for a .vcf file, this error comes up:
When looking into the resulting vcf file, only calls from chrUN_GL000218v1 are present, no other calls from any other chromosomes. Again, the .xlsx and .tsv files are okay. Do you maybe know what could be causing this?