aquaskyline / LRSIM

10x Genomics Reads Simulator
MIT License
45 stars 15 forks source link

LRSIM crashes and reports "not defined chr1_182578874_182579@chr1" #36

Open morispi opened 3 years ago

morispi commented 3 years ago

Hi,

I am attempting to run LRSIM on a human chr1, but I'm encountering the aforementioned error.

Here is the command I'm using: perl ../simulateLinkedReads.pl -r ./Chr1.fasta -p SapiensChr1 -c fragmentSizesList -x 30 -f 50 -t 500 -m 10 -0 0 -o

And here is LRSIM output:

Tue Mar 16 16:27:09 2021: SapiensChr1.status
Tue Mar 16 16:27:09 2021: Variant simulation mode enabled
Tue Mar 16 16:27:09 2021: SURVIVOR start
Tue Mar 16 16:27:09 2021: Running: /home/morispi/StructuralVariants/LRSIM/SURVIVOR 0 ./Chr1.fasta SapiensChr1.hap.parameter 0 SapiensChr1.hap 1000
Tue Mar 16 16:27:22 2021: SURVIVOR end
Tue Mar 16 16:27:22 2021: Build genome index start
Tue Mar 16 16:27:22 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.0.fasta 0 > SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:26 2021: /home/morispi/StructuralVariants/LRSIM/faFilter.pl SapiensChr1.hap.1.fasta 0 > SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:30 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.0.clean.fasta
Tue Mar 16 16:27:32 2021: /home/morispi/StructuralVariants/LRSIM/samtools faidx SapiensChr1.hap.1.clean.fasta
Tue Mar 16 16:27:36 2021: Build genome index end
Tue Mar 16 16:27:36 2021: DWGSIM round 0 thread 0 start
Tue Mar 16 16:27:36 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
0Tue Mar 16 16:27:38 2021: DWGSIM round 0 thread 1 start
Tue Mar 16 16:27:38 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
0Tue Mar 16 16:27:40 2021: DWGSIM round 0 thread 2 start
Tue Mar 16 16:27:40 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 20000Tue Mar 16 16:27:43 2021: DWGSIM round 0 thread 3 start
Tue Mar 16 16:27:43 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.0.clean.fasta SapiensChr1.dwgsim.0.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 280000Tue Mar 16 16:27:46 2021: DWGSIM round 1 thread 0 start
Tue Mar 16 16:27:46 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.0
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 280000Tue Mar 16 16:27:50 2021: DWGSIM round 1 thread 1 start
Tue Mar 16 16:27:50 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.1
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 180000Tue Mar 16 16:27:53 2021: DWGSIM round 1 thread 2 start
Tue Mar 16 16:27:53 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.2
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 450000Tue Mar 16 16:27:56 2021: DWGSIM round 1 thread 3 start
Tue Mar 16 16:27:56 2021: /home/morispi/StructuralVariants/LRSIM/dwgsim -N 1875000 -e 0.0001,0.0016 -E 0.0001,0.0016 -d 350 -s 35 -1 135 -2 151 -H -y 0 -S 0 -c 0 -m /dev/null SapiensChr1.hap.1.clean.fasta SapiensChr1.dwgsim.1.3
[dwgsim_core] chr1 length: 249250621
[dwgsim_core] 1 sequences, total length: 249250621
[dwgsim_core] Currently on: 
[dwgsim_core] 410000Tue Mar 16 16:27:58 2021: DWGSIM round 0 thread 3 end
[dwgsim_core] 510000Tue Mar 16 16:28:02 2021: DWGSIM round 0 thread 1 end
[dwgsim_core] 1290000
[dwgsim_core] Complete!
Tue Mar 16 16:28:38 2021: DWGSIM round 0 thread 0 end
Tue Mar 16 16:28:38 2021: cat SapiensChr1.dwgsim.0.1.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1490000
[dwgsim_core] Complete!
Tue Mar 16 16:28:45 2021: DWGSIM round 0 thread 2 end
Tue Mar 16 16:28:45 2021: cat SapiensChr1.dwgsim.0.2.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1330000Tue Mar 16 16:28:51 2021: cat SapiensChr1.dwgsim.0.3.12.fastq >> SapiensChr1.dwgsim.0.12.fastq
[dwgsim_core] 1750000
[dwgsim_core] Complete!
Tue Mar 16 16:28:54 2021: DWGSIM round 1 thread 1 end
[dwgsim_core] 1770000
[dwgsim_core] Complete!
[dwgsim_core] 1510000Tue Mar 16 16:28:55 2021: DWGSIM round 1 thread 0 end
Tue Mar 16 16:28:55 2021: cat SapiensChr1.dwgsim.1.1.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:28:57 2021: DWGSIM round 1 thread 2 end
[dwgsim_core] 1700000Tue Mar 16 16:28:59 2021: cat SapiensChr1.dwgsim.1.2.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
[dwgsim_core] 1875000
[dwgsim_core] Complete!
Tue Mar 16 16:29:02 2021: DWGSIM round 1 thread 3 end
Tue Mar 16 16:29:02 2021: cat SapiensChr1.dwgsim.1.3.12.fastq >> SapiensChr1.dwgsim.1.12.fastq
Tue Mar 16 16:29:07 2021: Simulate reads start
Tue Mar 16 16:29:07 2021: Load barcodes start
Tue Mar 16 16:29:09 2021: Load barcodes end
Tue Mar 16 16:29:09 2021: Using fragment sizes from fragmentSizesList instead of Poisson distribution
Tue Mar 16 16:29:09 2021: 10000 sizes loaded
Tue Mar 16 16:29:09 2021: Average fragment size: 50kbp
Tue Mar 16 16:29:09 2021: readPairsPerMolecule: 2
Tue Mar 16 16:29:09 2021: Simulating on haplotype: 0
Tue Mar 16 16:29:09 2021: Load read positions haplotype 0
Tue Mar 16 16:29:21 2021: not defined chr1_182578874_182579@chr1
Inappropriate ioctl for device at ../simulateLinkedReads.pl line 748, <$fh> line 19543360.
Command exited with non-zero status 25

It does not seem to be a memory issue, since it only uses 4 GB.

Moreover, when when I try to slightly modify the parameters (for instance setting -x 10 or -n to skip the variants simulation), the error seems to change randomly. I once had Cannot find correct chromosome and position in @chr1_82788009_82787815_1_0_0_0_0:0:0_0:0:0_0/2, and once had Cannot find correct chromosome and position in IFIHIGIIFIGFEBFCHD@DEECDCBEDECCB@BCBABBFBCABCA@DC@BAAAB@?A@?@?>?B?C@?@B<>??:??@@>>?>A@==A@@@@<@A@@>>B=@>?>C>?>?=>??;;>>?=?>==?=>;?;;==<and other variations.

I'm having a hard time understanding what is going on here. I already managed to run LRSIM correctly on smaller datasets and never encountered this issue.

Do you have any suggestions?

Thanks, Pierre

morispi commented 3 years ago

For information, LRSIM does work well on a subset of human chr1, if that might help. Still not sure what's causing the issue on the full chromosome though.