bjtrost / TCAG-WGS-CNV-workflow

Scripts involved in our workflow for detecting CNVs from WGS data using read depth-based methods
MIT License
45 stars 17 forks source link

Merging ERDS and CNVnator files error #11

Open ghausiabegum opened 4 years ago

ghausiabegum commented 4 years ago

Hi,

I followed the script process_cnvs.erds+.sh, and I get the following error. Could you please help me out with it?

/scratch/SOFTWARE/TCAG-WGS-CNV-workflow/format_cnvnator_results.py /scratch/SOFTWARE/TCAG-WGS-CNV-workflow/format_erds_results.py /scratch/SOFTWARE/TCAG-WGS-CNV-workflow/merge_cnvnator_results.py /scratch/SOFTWARE/TCAG-WGS-CNV-workflow/merge_erds_results.py /scratch/SOFTWARE/TCAG-WGS-CNV-workflow/add_features.py /scratch/SOFTWARE/TCAG-WGS-CNV-workflow/hg19_gap.bed Set-up.. Found erds/original, creating erds/formatted Found cnvn/original, creating cnvn/formatted_filtered Formatting, filtering and merging cnvn output Warning: Header not found, using default.. Traceback (most recent call last): File "/scratch/SOFTWARE/TCAG-WGS-CNV-workflow/format_cnvnator_results.py", line 69, in type = format[words[header_index["CNV_type"]]] KeyError: '1' Traceback (most recent call last): File "/scratch/SOFTWARE/TCAG-WGS-CNV-workflow/merge_cnvnator_results.py", line 181, in gaps[chrm]=[[int(words[1]),int(words[2]),words[3]]] IndexError: list index out of range Formatting and merging erds output Traceback (most recent call last): File "/scratch/SOFTWARE/TCAG-WGS-CNV-workflow/merge_erds_results.py", line 180, in gaps[chrm]=[[int(words[1]),int(words[2]),words[3]]] IndexError: list index out of range genearting sample ids.. merging erds and cnvn calls Merged erds output not found ... check log files._**

Thanks!

bjtrost commented 4 years ago

Hi,

Unfortunately I cannot tell offhand from the error messages. Are you able to share your CNVnator and ERDS files that you are using so I can test it myself?

Cheers,

Brett

ghausiabegum commented 4 years ago

Hi,

Thanks for your reply!

I have attached a snippet of the files I am working within this email. Please let me know if you need anything else.

Best,

On Mon, Jun 1, 2020 at 6:33 AM Brett Trost notifications@github.com wrote:

Hi,

Unfortunately I cannot tell offhand from the error messages. Are you able to share your CNVnator and ERDS files that you are using so I can test it myself?

Cheers,

Brett

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bjtrost/TCAG-WGS-CNV-workflow/issues/11#issuecomment-636580852, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOU5LL4J435GEB352TIX7CDRUMHOPANCNFSM4NPDCJEA .

-- Regards Ghausia Begum

fileformat=VCFv4.0

fileDate=20190320

reference=1000GenomesPhase3_decoy-GRCh37

source=CNVnator

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

ALT=

ALT=

CHROM POS ID REF ALT QUAL FILTER INFO

1 1 CNVnator_del_1 N . PASS END=10000;SVTYPE=DEL;SVLEN=-10000;IMPRECISE;natorRD=0;natorP1=1.59373e-11;natorP2=1.56125e-280;natorP3=1.99216e-11;natorP4=1.11272e-222;natorQ0=-1 1 150001 CNVnator_del_2 T . PASS END=155500;SVTYPE=DEL;SVLEN=-5500;IMPRECISE;natorRD=0.578662;natorP1=0.451236;natorP2=1.82038e-07;natorP3=279.558;natorP4=0.14135;natorQ0=0.883817 1 176001 CNVnator_del_3 G . PASS END=227500;SVTYPE=DEL;SVLEN=-51500;IMPRECISE;natorRD=0.016173;natorP1=3.09461e-12;natorP2=5.19618e-152;natorP3=3.21965e-12;natorP4=0;natorQ0=0.967427 1 231001 CNVnator_dup_4 C . PASS END=242000;SVTYPE=DUP;SVLEN=11000;IMPRECISE;natorRD=1.91092;natorP1=0.00214848;natorP2=0.0216277;natorP3=0.0357276;natorP4=2.27708;natorQ0=0.284183 1 267501 CNVnator_del_5 A . PASS END=318000;SVTYPE=DEL;SVLEN=-50500;IMPRECISE;natorRD=0.00471782;natorP1=3.15589e-12;natorP2=0;natorP3=3.28603e-12;natorP4=0;natorQ0=0.248649 1 326501 CNVnator_del_6 G . PASS END=327500;SVTYPE=DEL;SVLEN=-1000;IMPRECISE;natorRD=0.374265;natorP1=49058.3;natorP2=0.0327534;natorP3=1;natorP4=1;natorQ0=1 1 386501 CNVnator_del_7 C . PASS END=521500;SVTYPE=DEL;SVLEN=-135000;IMPRECISE;natorRD=0.449698;natorP1=1.18054e-12;natorP2=2.871e+09;natorP3=1.19829e-12;natorP4=2.871e+09;natorQ0=0.984279 1 1011501 CNVnator_del_8 T . PASS END=1013500;SVTYPE=DEL;SVLEN=-2000;IMPRECISE;natorRD=0.456255;natorP1=1301.48;natorP2=0.010141;natorP3=1;natorP4=1;natorQ0=0.386617 1 1285501 CNVnator_del_9 G . PASS END=1287000;SVTYPE=DEL;SVLEN=-1500;IMPRECISE;natorRD=0.214173;natorP1=13978.1;natorP2=2.05364e-06;natorP3=1;natorP4=1;natorQ0=0.182432 1 2053001 CNVnator_del_10 G . PASS END=2055500;SVTYPE=DEL;SVLEN=-2500;IMPRECISE;natorRD=0.219492;natorP1=377.504;natorP2=1.37905e-08;natorP3=1;natorP4=1;natorQ0=0.158371 1 2583001 CNVnator_dup_11 A . PASS END=2616000;SVTYPE=DUP;SVLEN=33000;IMPRECISE;natorRD=4.74277;natorP1=0;natorP2=0.00145508;natorP3=0;natorP4=0.00809184;natorQ0=0.780204 1 2634001 CNVnator_del_12 G . PASS END=2684500;SVTYPE=DEL;SVLEN=-50500;IMPRECISE;natorRD=0.00391504;natorP1=3.15589e-12;natorP2=0;natorP3=3.28603e-12;natorP4=0;natorQ0=0.934783 1 3845501 CNVnator_del_13 N . PASS END=3995500;SVTYPE=DEL;SVLEN=-150000;IMPRECISE;natorRD=0.00149144;natorP1=1.06248e-12;natorP2=0;natorP3=1.07684e-12;natorP4=0;natorQ0=0 1 12930501 CNVnator_dup_14 T . PASS END=12940500;SVTYPE=DUP;SVLEN=10000;IMPRECISE;natorRD=1.7555;natorP1=0.00202992;natorP2=0.06397;natorP3=0.04097;natorP4=8.63749;natorQ0=0.104094 1 13027001 CNVnator_del_15 T . PASS END=13038500;SVTYPE=DEL;SVLEN=-11500;IMPRECISE;natorRD=0.666445;natorP1=0.00467794;natorP2=3400.01;natorP3=0.0715599;natorP4=36490.9;natorQ0=0.839578 1 13053001 CNVnator_del_16 N . PASS END=13116000;SVTYPE=DEL;SVLEN=-63000;IMPRECISE;natorRD=0.110352;natorP1=2.52972e-12;natorP2=1.19574e-54;natorP3=2.61266e-12;natorP4=4.20872e-163;natorQ0=0.854849 1 13117001 CNVnator_del_17 C . PASS END=13132500;SVTYPE=DEL;SVLEN=-15500;IMPRECISE;natorRD=0.262738;natorP1=1.02821e-11;natorP2=3.97224e-33;natorP3=5.90269e-11;natorP4=1.00045e-27;natorQ0=0.918182 1 13155501 CNVnator_del_18 A . PASS END=13166000;SVTYPE=DEL;SVLEN=-10500;IMPRECISE;natorRD=0.556388;natorP1=0.00728893;natorP2=7.79484e+07;natorP3=0.24383;natorP4=1.5493e+08;natorQ0=0.720264 1 13220001 CNVnator_del_19 N . PASS END=13443000;SVTYPE=DEL;SVLEN=-223000;IMPRECISE;natorRD=0.4124;natorP1=7.14675e-13;natorP2=2.8694e+09;natorP3=7.21143e-13;natorP4=2.86941e+09;natorQ0=0.771207

bjtrost commented 4 years ago

Ah - I suspect it is because you are using the VCF version of the CNVnator calls rather than the tab-delimited version. The script expects the tab-delimited version. (However, the ERDS file is expected to be the VCF). I will edit the docs to make this clearer. Please let me know if this works. Sorry for the confusion!

Cheers,

Brett

ghausiabegum commented 4 years ago

I am using vcf version for erds variants and txt version for cnvnator variants. The files I sent are supposed to be 006.erds.cvf and 006.calls.txt (I changed the format while sending it to you, sorry for the confusion)

I used the correct format for each file yet I get the error.

On 1 Jun 2020, at 7:58 PM, Brett Trost notifications@github.com wrote:  Ah - I suspect it is because you are using the VCF version of the CNVnator calls rather than the tab-delimited version. The script expects the tab-delimited version. (However, the ERDS file is expected to be the VCF). I will edit the docs to make this clearer. Please let me know if this works. Sorry for the confusion!

Cheers,

Brett

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

bjtrost commented 4 years ago

Can you send me the actual files you used? You could include just the headers plus the first three non-header lines in each file. I think this should be enough to diagnose the problem.

Cheers,

Brett

ghausiabegum commented 4 years ago

The files I sent are the actual files: header plus a few lines, just the name of the file is different. Nevertheless, I have attached a copy of it in this email. 006.calls.txt are the variants called from CNVnator. 006.erds.vcf are the variants called from ERDS.

I hope this helps.

On Tue, Jun 2, 2020 at 12:56 AM Brett Trost notifications@github.com wrote:

Can you send me the actual files you used? You could include just the headers plus the first three non-header lines in each file. I think this should be enough to diagnose the problem.

Cheers,

Brett

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bjtrost/TCAG-WGS-CNV-workflow/issues/11#issuecomment-637097315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOU5LL7EFK35RE5HTBMMIH3RUQIYVANCNFSM4NPDCJEA .

-- Regards Ghausia Begum

fileformat=VCFv4.0

fileDate=20190320

reference=1000GenomesPhase3_decoy-GRCh37

source=CNVnator

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

ALT=

ALT=

CHROM POS ID REF ALT QUAL FILTER INFO

1 1 CNVnator_del_1 N . PASS END=10000;SVTYPE=DEL;SVLEN=-10000;IMPRECISE;natorRD=0;natorP1=1.59373e-11;natorP2=1.56125e-280;natorP3=1.99216e-11;natorP4=1.11272e-222;natorQ0=-1 1 150001 CNVnator_del_2 T . PASS END=155500;SVTYPE=DEL;SVLEN=-5500;IMPRECISE;natorRD=0.578662;natorP1=0.451236;natorP2=1.82038e-07;natorP3=279.558;natorP4=0.14135;natorQ0=0.883817 1 176001 CNVnator_del_3 G . PASS END=227500;SVTYPE=DEL;SVLEN=-51500;IMPRECISE;natorRD=0.016173;natorP1=3.09461e-12;natorP2=5.19618e-152;natorP3=3.21965e-12;natorP4=0;natorQ0=0.967427 1 231001 CNVnator_dup_4 C . PASS END=242000;SVTYPE=DUP;SVLEN=11000;IMPRECISE;natorRD=1.91092;natorP1=0.00214848;natorP2=0.0216277;natorP3=0.0357276;natorP4=2.27708;natorQ0=0.284183 1 267501 CNVnator_del_5 A . PASS END=318000;SVTYPE=DEL;SVLEN=-50500;IMPRECISE;natorRD=0.00471782;natorP1=3.15589e-12;natorP2=0;natorP3=3.28603e-12;natorP4=0;natorQ0=0.248649 1 326501 CNVnator_del_6 G . PASS END=327500;SVTYPE=DEL;SVLEN=-1000;IMPRECISE;natorRD=0.374265;natorP1=49058.3;natorP2=0.0327534;natorP3=1;natorP4=1;natorQ0=1 1 386501 CNVnator_del_7 C . PASS END=521500;SVTYPE=DEL;SVLEN=-135000;IMPRECISE;natorRD=0.449698;natorP1=1.18054e-12;natorP2=2.871e+09;natorP3=1.19829e-12;natorP4=2.871e+09;natorQ0=0.984279 1 1011501 CNVnator_del_8 T . PASS END=1013500;SVTYPE=DEL;SVLEN=-2000;IMPRECISE;natorRD=0.456255;natorP1=1301.48;natorP2=0.010141;natorP3=1;natorP4=1;natorQ0=0.386617 1 1285501 CNVnator_del_9 G . PASS END=1287000;SVTYPE=DEL;SVLEN=-1500;IMPRECISE;natorRD=0.214173;natorP1=13978.1;natorP2=2.05364e-06;natorP3=1;natorP4=1;natorQ0=0.182432 1 2053001 CNVnator_del_10 G . PASS END=2055500;SVTYPE=DEL;SVLEN=-2500;IMPRECISE;natorRD=0.219492;natorP1=377.504;natorP2=1.37905e-08;natorP3=1;natorP4=1;natorQ0=0.158371 1 2583001 CNVnator_dup_11 A . PASS END=2616000;SVTYPE=DUP;SVLEN=33000;IMPRECISE;natorRD=4.74277;natorP1=0;natorP2=0.00145508;natorP3=0;natorP4=0.00809184;natorQ0=0.780204 1 2634001 CNVnator_del_12 G . PASS END=2684500;SVTYPE=DEL;SVLEN=-50500;IMPRECISE;natorRD=0.00391504;natorP1=3.15589e-12;natorP2=0;natorP3=3.28603e-12;natorP4=0;natorQ0=0.934783 1 3845501 CNVnator_del_13 N . PASS END=3995500;SVTYPE=DEL;SVLEN=-150000;IMPRECISE;natorRD=0.00149144;natorP1=1.06248e-12;natorP2=0;natorP3=1.07684e-12;natorP4=0;natorQ0=0 1 12930501 CNVnator_dup_14 T . PASS END=12940500;SVTYPE=DUP;SVLEN=10000;IMPRECISE;natorRD=1.7555;natorP1=0.00202992;natorP2=0.06397;natorP3=0.04097;natorP4=8.63749;natorQ0=0.104094 1 13027001 CNVnator_del_15 T . PASS END=13038500;SVTYPE=DEL;SVLEN=-11500;IMPRECISE;natorRD=0.666445;natorP1=0.00467794;natorP2=3400.01;natorP3=0.0715599;natorP4=36490.9;natorQ0=0.839578 1 13053001 CNVnator_del_16 N . PASS END=13116000;SVTYPE=DEL;SVLEN=-63000;IMPRECISE;natorRD=0.110352;natorP1=2.52972e-12;natorP2=1.19574e-54;natorP3=2.61266e-12;natorP4=4.20872e-163;natorQ0=0.854849 1 13117001 CNVnator_del_17 C . PASS END=13132500;SVTYPE=DEL;SVLEN=-15500;IMPRECISE;natorRD=0.262738;natorP1=1.02821e-11;natorP2=3.97224e-33;natorP3=5.90269e-11;natorP4=1.00045e-27;natorQ0=0.918182 1 13155501 CNVnator_del_18 A . PASS END=13166000;SVTYPE=DEL;SVLEN=-10500;IMPRECISE;natorRD=0.556388;natorP1=0.00728893;natorP2=7.79484e+07;natorP3=0.24383;natorP4=1.5493e+08;natorQ0=0.720264 1 13220001 CNVnator_del_19 N . PASS END=13443000;SVTYPE=DEL;SVLEN=-223000;IMPRECISE;natorRD=0.4124;natorP1=7.14675e-13;natorP2=2.8694e+09;natorP3=7.21143e-13;natorP4=2.86941e+09;natorQ0=0.771207

bjtrost commented 4 years ago

For some reason, I cannot see the attachments. Can you send them as a regular e-mail to brett.trost@sickkids.ca? Thanks!