arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
307 stars 119 forks source link

Some error raised and some questions #299

Closed codeunsolved closed 5 years ago

codeunsolved commented 5 years ago

Hi, LUMPY heroes:

Recently, I run LUMPY(v0.2.13) on NA12878 for benchmarking. It seems to output some error, but VCF file was successfully generated. So I am not sure the VCF file is valid.

I followed the workflow from README with LUMPY (traditional) usage. Here is my workflow and log:

CLICK ME

``` [INFO] >> 1 BAMs found! [INFO] >> Caller : lumpy [INFO] • No.1 ******.bam 94.34MB [INFO] >> Output : /******/TEST/test_lumpy/****** [INFO] • Run lumpy v0.2.13 [INFO] [RUN_LUMPY] SampleID: ******, RLEN: 150, min_non_overlap: 101, discordant_z: 5, back_distance: 10, weight: 1, min_mapping_threshold: 20, mw: 4, tt: 0 [INFO] [RUN_LUMPY] Step 0-1: Generate discordants BAM for PE [INFO] CMD: samtools view -b -F 1294 /******/Downloads/NGS-Data/******/******.bam > /******/TEST/test_lumpy/******/******.discordants.unsorted.bam [INFO] CMD: samtools sort /******/TEST/test_lumpy/******/******.discordants.unsorted.bam > /******/TEST/test_lumpy/******/******.discordants.bam [INFO] CMD: rm /******/TEST/test_lumpy/******/******.discordants.unsorted.bam [INFO] [RUN_LUMPY] Step 0-2: Generate splitters BAM for SR [INFO] CMD: samtools view -h /******/Downloads/NGS-Data/******/******.bam | /******/miniconda3/envs/py2.7/bin/python /******/Projects/NGS/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | samtools view -Sb - > /******/TEST/test_lumpy/******/******.splitters.unsorted.bam [INFO] CMD: samtools sort /******/TEST/test_lumpy/******/******.splitters.unsorted.bam > /******/TEST/test_lumpy/******/******.splitters.bam [INFO] CMD: rm /******/TEST/test_lumpy/******/******.splitters.unsorted.bam [INFO] [RUN_LUMPY] Step 1: Generate histo [INFO] CMD: samtools view /******/Downloads/NGS-Data/******/******.bam | tail -n 100000 | /******/miniconda3/envs/py2.7/bin/python /******/Projects/NGS/lumpy-sv/scripts/pairend_distro.py -r 150 -X 4 -N 10000 -o /******/TEST/test_lumpy/******/******.histo [INFO] mean:218.3957 stdev:85.4206071245 [ERROR] Removed 0 outliers with isize >= 844 [INFO] [RUN_LUMPY] histo mean: 218.3957 stdev: 85.4206071245 [INFO] [RUN_LUMPY] Step 2: Run LUMPY with paired-end and split-reads [INFO] CMD: /******/Projects/NGS/lumpy-sv/bin/lumpy -mw 4 -tt 0 -pe 'id:******,bam_file:/******/TEST/test_lumpy/******/******.discordants.bam,histo_file:/******/TEST/test_lumpy/******/******.histo,mean:218.3957,stdev:85.4206071245,read_length:150,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20' -sr 'id:******,bam_file:/******/TEST/test_lumpy/******/******.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20' > /******/TEST/test_lumpy/******/******.vcf [ERROR] 417 0 1 1000000 2 1000000 3 1000000 4 1000000 5 1000000 6 1000000 7 1000000 8 1000000 9 1000000 10 1000000 11 1000000 12 1000000 13 1000000 13 2000000 13 4000000 13 8000000 13 16000000 13 32000000 14 1000000 14 2000000 14 4000000 14 8000000 14 16000000 14 32000000 15 1000000 15 2000000 15 4000000 15 8000000 15 16000000 15 32000000 16 1000000 17 1000000 18 1000000 19 1000000 20 1000000 20 2000000 21 1000000 21 2000000 21 4000000 21 8000000 21 16000000 22 1000000 22 2000000 22 4000000 22 8000000 22 16000000 22 32000000 X 1000000 Y 1000000 Y 2000000 Y 4000000 MT 1000000 GL000226.1 1000000 GL000229.1 1000000 GL000239.1 1000000 GL000245.1 1000000 GL000197.1 1000000 GL000246.1 1000000 GL000232.1 1000000 GL000237.1 1000000 GL000204.1 1000000 GL000198.1 1000000 GL000208.1 1000000 GL000228.1 1000000 GL000214.1 1000000 GL000221.1 1000000 GL000218.1 1000000 GL000220.1 1000000 GL000199.1 1000000 GL000217.1 1000000 GL000216.1 1000000 GL000205.1 1000000 GL000219.1 1000000 GL000224.1 1000000 GL000195.1 1000000 GL000212.1 1000000 GL000222.1 1000000 GL000193.1 1000000 GL000225.1 1000000 GL000192.1 1000000 [INFO] [TIME] Lumpy: 1min 01sec ```

At the end of log, LUMPY output something to stderr:

417 0 1 1000000 2 1000000 3 1000000 ......

Questions: (1). Is this output ok? or somethings not expected occurred that I should pay attention to. (2). Then, I found all REF in VCF are 'N'. Is that also the expected output? (3). I want to plot ROC using SU in FORMAT as threshold, Is that reasonable? It seems to be the same thing in paper.

Many thanks!

ryanlayer commented 5 years ago

On Mon, Apr 8, 2019 at 10:51 PM codeunsolved notifications@github.com wrote:

Hi, LUMPY heroes:

Recently, I run LUMPY(v0.2.13) on NA12878 for benchmarking. It seems to output some error, but VCF file was successfully generated. So I am not sure the VCF file is valid.

I followed the workflow from README with LUMPY (traditional) usage. Here is my workflow and log: CLICK ME

[INFO] >> 1 BAMs found!

[INFO] >> Caller : lumpy

[INFO] • No.1 **.bam 94.34MB

[INFO] >> Output : /**/TEST/test_lumpy/**

[INFO] • Run lumpy v0.2.13

[INFO] [RUN_LUMPY] SampleID: **, RLEN: 150, min_non_overlap: 101, discordant_z: 5, back_distance: 10, weight: 1, min_mapping_threshold: 20, mw: 4, tt: 0

[INFO] [RUN_LUMPY] Step 0-1: Generate discordants BAM for PE

[INFO] CMD: samtools view -b -F 1294 /**/Downloads/NGS-Data/**/**.bam > /**/TEST/test_lumpy/**/**.discordants.unsorted.bam

[INFO] CMD: samtools sort /**/TEST/test_lumpy/**/**.discordants.unsorted.bam > /**/TEST/test_lumpy/**/**.discordants.bam

[INFO] CMD: rm /**/TEST/test_lumpy/**/**.discordants.unsorted.bam

[INFO] [RUN_LUMPY] Step 0-2: Generate splitters BAM for SR

[INFO] CMD: samtools view -h /**/Downloads/NGS-Data/**/**.bam | /**/miniconda3/envs/py2.7/bin/python /**/Projects/NGS/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | samtools view -Sb - > /**/TEST/test_lumpy/**/**.splitters.unsorted.bam

[INFO] CMD: samtools sort /**/TEST/test_lumpy/**/**.splitters.unsorted.bam > /**/TEST/test_lumpy/**/**.splitters.bam

[INFO] CMD: rm /**/TEST/test_lumpy/**/**.splitters.unsorted.bam

[INFO] [RUN_LUMPY] Step 1: Generate histo

[INFO] CMD: samtools view /**/Downloads/NGS-Data/**/**.bam | tail -n 100000 | /**/miniconda3/envs/py2.7/bin/python /**/Projects/NGS/lumpy-sv/scripts/pairend_distro.py -r 150 -X 4 -N 10000 -o /**/TEST/test_lumpy/**/**.histo

[INFO] mean:218.3957 stdev:85.4206071245

[ERROR] Removed 0 outliers with isize >= 844

[INFO] [RUN_LUMPY] histo mean: 218.3957 stdev: 85.4206071245

[INFO] [RUN_LUMPY] Step 2: Run LUMPY with paired-end and split-reads

[INFO] CMD: /**/Projects/NGS/lumpy-sv/bin/lumpy -mw 4 -tt 0 -pe 'id:**,bam_file:/**/TEST/test_lumpy/**/**.discordants.bam,histo_file:/**/TEST/test_lumpy/**/**.histo,mean:218.3957,stdev:85.4206071245,read_length:150,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20' -sr 'id:**,bam_file:/**/TEST/test_lumpy/**/**.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20' > /**/TEST/test_lumpy/**/**.vcf

[ERROR] 417 0

1 1000000

2 1000000

3 1000000

4 1000000

5 1000000

6 1000000

7 1000000

8 1000000

9 1000000

10 1000000

11 1000000

12 1000000

13 1000000

13 2000000

13 4000000

13 8000000

13 16000000

13 32000000

14 1000000

14 2000000

14 4000000

14 8000000

14 16000000

14 32000000

15 1000000

15 2000000

15 4000000

15 8000000

15 16000000

15 32000000

16 1000000

17 1000000

18 1000000

19 1000000

20 1000000

20 2000000

21 1000000

21 2000000

21 4000000

21 8000000

21 16000000

22 1000000

22 2000000

22 4000000

22 8000000

22 16000000

22 32000000

X 1000000

Y 1000000

Y 2000000

Y 4000000

MT 1000000

GL000226.1 1000000

GL000229.1 1000000

GL000239.1 1000000

GL000245.1 1000000

GL000197.1 1000000

GL000246.1 1000000

GL000232.1 1000000

GL000237.1 1000000

GL000204.1 1000000

GL000198.1 1000000

GL000208.1 1000000

GL000228.1 1000000

GL000214.1 1000000

GL000221.1 1000000

GL000218.1 1000000

GL000220.1 1000000

GL000199.1 1000000

GL000217.1 1000000

GL000216.1 1000000

GL000205.1 1000000

GL000219.1 1000000

GL000224.1 1000000

GL000195.1 1000000

GL000212.1 1000000

GL000222.1 1000000

GL000193.1 1000000

GL000225.1 1000000

GL000192.1 1000000

[INFO] [TIME] Lumpy: 1min 01sec

At the end of log, LUMPY output something to stderr:

417 0 1 1000000 2 1000000 3 1000000 ......

Questions: (1). Is this output ok? or somethings not expected occurred that I should pay attention to.

Looks good to me.

(2). Then, I found all REF in VCF are 'N'. Is that also the expected output?

Yes. That is instead of putting a huge string.

(3). I want to plot ROC using SU in FORMAT as threshold, Is that reasonable? It seems to be the same thing in paper.

Sure.

Many thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/299, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUXy7RMll_GHU6Meu-l10BU8XHqtJks5vfBxGgaJpZM4cjmbG .

codeunsolved commented 5 years ago

Thank you! thank you for your help! Dr. Layer.