NabaviLab / CNV-Sim

Copy Number Variations (CNV) Simulator
10 stars 8 forks source link

a few errors in execution of CNV-SIM #1

Closed jayaramanp closed 5 years ago

jayaramanp commented 8 years ago

Hello, i found your reporsitory CNV-Sim a couple days ago when i was working with WESSIM and i found that it helps me generate fastq paired end reads for certain regions I want to simulate cnvs in ..

so far i get one error: "KeyError: "sequence '2' not present" looks like its being thrown by WESSIM but ive run wessim by itself without any issues..

what does that error mean?

[jayaramanp@dgdrhr-01 CNV-Sim]$ python cnv-sim.py -o ~/Wessim_work/CNV-Sim/cnvSimOP/ --cnv_list ~/bamSurgeonData/cnvTospike1.bed exome /nfs/Public/reference/humanref/v37/human_g1k_v37.fasta /nfs/DGD/Clinical/kbase/sureselectv5p/V5P_probes.bed [CNV SIM 2016-08-11 07:38:45] simulation type: whole exome [CNV SIM 2016-08-11 07:39:19] loading genome file .. [CNV SIM 2016-08-11 07:39:56] successfully loaded a genome of length 3101804739 [CNV SIM 2016-08-11 07:39:56] loading target file .. [CNV SIM 2016-08-11 07:39:56] sorting and merging targets .. [CNV SIM 2016-08-11 07:40:07] successfully loaded 229118 targets .. [CNV SIM 2016-08-11 07:40:07] loading CNV list .. [CNV SIM 2016-08-11 07:40:08] successfully loaded CNV list that contains 11 regions .. [CNV SIM 2016-08-11 07:40:08] generating reads for the target exons .. [CNV SIM 2016-08-11 07:40:08] delegating job to Wessim ... Generating fasta file for given regions... [fai_load] build FASTA index.


Reference: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/reference.fa Region file: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/target.bed.sorted.merged Fragment: 200 +- 50 > 120 Paired-end mode? True Sequencing model: models/ill100v5_p.gzip Read length: 100 Read number: 10000 Output File: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/base Gzip compress? False Quality base: 33 Thread number: 8

Job started at: 2016-08-11 07:40:27

exiting subprocess 7 exiting subprocess 2 exiting subprocess 1 exiting subprocess 5 exiting subprocess 3 exiting subprocess 4 exiting subprocess 6 exiting subprocess 8 Done generating 10000 reads in 26.315505 secs Merging subresults... [CNV SIM 2016-08-11 07:40:35] simulating copy number variations (amplifications/deletions) [CNV SIM 2016-08-11 07:40:35] saving to the control genome file .. [CNV SIM 2016-08-11 07:40:35] saving to the control target file .. [CNV SIM 2016-08-11 07:40:35] delegating job to Wessim ... Generating fasta file for given regions... [fai_load] build FASTA index. Traceback (most recent call last): File "Wessim1.py", line 197, in main(sys.argv[1:]) File "Wessim1.py", line 57, in main getRegionVector(reffile, regionfile, slack) File "Wessim1.py", line 186, in getRegionVector x = ref.fetch(chrom, start, end) File "pysam/cfaidx.pyx", line 275, in pysam.cfaidx.FastaFile.fetch (pysam/cfaidx.c:4704) KeyError: "sequence '2' not present" [CNV SIM 2016-08-11 07:40:35] saving to the CNV genome file .. [CNV SIM 2016-08-11 07:40:35] saving to the CNV target file .. [CNV SIM 2016-08-11 07:40:35] delegating job to Wessim ... Generating fasta file for given regions... [fai_load] build FASTA index. Traceback (most recent call last): File "Wessim1.py", line 197, in main(sys.argv[1:]) File "Wessim1.py", line 57, in main getRegionVector(reffile, regionfile, slack) File "Wessim1.py", line 186, in getRegionVector x = ref.fetch(chrom, start, end) File "pysam/cfaidx.pyx", line 275, in pysam.cfaidx.FastaFile.fetch (pysam/cfaidx.c:4704) KeyError: "sequence '2' not present" [CNV SIM 2016-08-11 07:40:35] merging results .. cat: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/control_1.fastq: No such file or directory cat: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/control_2.fastq: No such file or directory cat: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/cnv_1.fastq: No such file or directory cat: /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/tmp/cnv_2.fastq: No such file or directory [CNV SIM 2016-08-11 07:40:36] cleaning temporary files .. [CNV SIM 2016-08-11 07:40:37] simulation completed. find results in /home/jayaramanp/Wessim_work/CNV-Sim/cnvSimOP/

abdelrahmanhosny commented 8 years ago

Thanks @jayaramanp for opening the issue. As the genome you are loading is of length 3101804739, introducing amplifications would make the reference file too big for the indices to be handled by pysam (which is used by Wessim)

Fix: use a reference for 1 chromosome at a time. Make sure to update the target file as well to include targets in the reference chromosome only.

On our side, we will work on throwing a friendly error.

jayaramanp commented 8 years ago

from what i understand, i have to have a separate cnvlist for each chromosome that i'm introducing cnvs in and also a separate reference fasta file for each chromosome .. so it would create cnv fastq files for each chromosome which i will merge at the end with my sample fastq?

and the cnv_1.fastq and cnv_2.fastq will be the fastq files with the amplified reads, i.e. contains simulated cnv, correct?

for future reference, is there a way to handle the entire reference file instead of splitting this up?

abdelrahmanhosny commented 8 years ago

That's exactly correct.

Yes, we are working on an improvement to divide the task of handling the entire reference file instead of splitting it up. It will be available in the next version.