SUwonglab / arcsv

Complex structural variant detection from WGS data
MIT License
21 stars 6 forks source link

Impossible to run example #10

Closed ebattistella closed 5 months ago

ebattistella commented 1 year ago

I am facing some difficulties when trying to run the example provided on the github, I get the following error: [run] ref files {'reference': 'example/reference.fa', 'gap': 'example/gaps.bed'} [run] calling SVs in 20:0-250000

[parse_bam] extracting approximate library stats [parse_bam] read_len: 100; rough_insert_median: 367.0 [library_stats] processed 200000 reads (75932 chunks) for each lib [library_stats] processed 400000 reads (145049 chunks) for each lib [library_stats] processed 600000 reads (210832 chunks) for each lib [library_stats] processed 800000 reads (272950 chunks) for each lib [library_stats] processed 1000000 reads (345156 chunks) for each lib Traceback (most recent call last): File "/Users/ebattist/Library/Python/3.11/bin/arcsv", line 156, in main() File "/Users/ebattist/Library/Python/3.11/bin/arcsv", line 26, in main run(args) File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/call_sv.py", line 93, in run call_sv(opts, inputs, reference_files) File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/call_sv.py", line 161, in call_sv pb_out = parse_bam(opts, reference_files, bamfiles) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 122, in parse_bam als = extract_approximate_library_stats(opts, bam, rough_insert_median) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 87, in extract_approximate_library_stats insert_pmf = [pmf_kernel_smooth(il, 0, opts['insert_max_mu_multiple'] mu, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 87, in insert_pmf = [pmf_kernel_smooth(il, 0, opts['insert_max_mu_multiple'] mu, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ebattist/Library/Python/3.11/lib/python/site-packages/arcsv/bamparser_streaming.py", line 467, in pmf_kernel_smooth pct = np.percentile(a_trunc, (25, 75)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 200, in percentile File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4205, in percentile return _quantile_unchecked( ^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4473, in _quantile_unchecked return _ureduce(a, ^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 3752, in _ureduce r = func(a, *kwargs) ^^^^^^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4639, in _quantile_ureduce_func result = _quantile(arr, ^^^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4756, in _quantile result = _lerp(previous, ^^^^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/numpy/lib/function_base.py", line 4575, in _lerp lerp_interpolation = asanyarray(add(a, diff_b_a t, out=out))


  File "/opt/homebrew/lib/python3.11/site-packages/numpy/matrixlib/defmatrix.py", line 218, in __mul__
    return N.dot(self, asmatrix(other))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<__array_function__ internals>", line 200, in dot
ValueError: shapes (2,976007) and (2,1) not aligned: 976007 (dim 1) != 2 (dim 0)

It is very likely due to a difference in packages. 
Could you send me the exact requirements for the packages and the version of python used? The setup.py only specifies the version of pysam
jgarthur commented 5 months ago

Hi, I have updated the code to fix this error. Please let me know if you still have the issue:

https://github.com/SUwonglab/arcsv/pull/12

ebattistella commented 4 months ago

Hi,

Thank you very much for your help.

ArcSV is now running normally.

I would have liked to know a bit more about the output files. It seems that the merge and filter function that you recommend using output a .tab file. How can we get a .vcf from this file?

Is there any reason the output should be a .tab? How to interpret the results?

I deeply appreciate your consideration. Best regards, Enzo Battistella

From: Joey Arthur @.> Date: Sunday, March 31, 2024 at 4:31 PM To: SUwonglab/arcsv @.> Cc: ebattistella @.>, Author @.> Subject: Re: [SUwonglab/arcsv] Impossible to run example (Issue #10)

Hi, I have updated the code to fix this error. Please let me know if you still have the issue:

12https://github.com/SUwonglab/arcsv/pull/12

— Reply to this email directly, view it on GitHubhttps://github.com/SUwonglab/arcsv/issues/10#issuecomment-2028897468, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCLLSFHWQMTHIL4EIGRJJLY3BXBXAVCNFSM6AAAAAAWZQXELOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHA4TONBWHA. You are receiving this because you authored the thread.Message ID: @.***>

jgarthur commented 4 months ago

Hi Enzo,

There is in fact a .vcf file output by the arcsv call command. The .tab file is easier to work with for complex structural variation containing multiple breakpoints, as the entire complex SV (set of linked breakpoints) is present in a single line, rather than one line per breakpoint as in VCF.

ebattistella commented 4 months ago

Hi Joey,

Thank you for your answer. Indeed after using the “call” command, we get a .vcf per chromosome. However, when using the command “filter-merge” only a .tab is obtained. Is it correct?

From: Joey Arthur @.> Date: Monday, April 22, 2024 at 6:45 PM To: SUwonglab/arcsv @.> Cc: ebattistella @.>, Author @.> Subject: Re: [SUwonglab/arcsv] Impossible to run example (Issue #10)

Hi Enzo,

There is in fact a .vcf file output by the arcsv call command. The .tab file is easier to work with for complex structural variation containing multiple breakpoints, as the entire complex SV (set of linked breakpoints) is present in a single line, rather than one line per breakpoint as in VCF.

— Reply to this email directly, view it on GitHubhttps://github.com/SUwonglab/arcsv/issues/10#issuecomment-2071073238, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCLLSEWY4GG3AH6L5ETUBDY6WHH5AVCNFSM6AAAAAAWZQXELOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZRGA3TGMRTHA. You are receiving this because you authored the thread.Message ID: @.***>

jgarthur commented 4 months ago

Ah, my apologies, you're right that filter-merge does not support VCF output. I'll note this as a possible future improvement. You should be able to accomplish the same using the individual .vcf files and bcftools concat and bcftools filter: https://samtools.github.io/bcftools/bcftools.html

ebattistella commented 4 months ago

Thank you very much for your answer.

I have another question regarding the output format for two complex events:

chr1 45742292 45795471 chr1_45709126-45816915 BND,BND invddup.inv 2 45709125,45742292,45794533,45795471,45816915 0,1,1,1,0 ABCD AC'B'CD 53179 PASS 45742292,45795471;45742292,45794533 1,1;1,1 HOM 1.000 NA 20,33 0,78 1244.90 0.00 ABL'K'J'I'H'G'CDEFGHIJKLMN/ABL'K'J'I'H'G'F'E'D'C'GHIJKLMN 132

chr1 47114920 47413535 chr1_47105173-47426055 BND,BND inv.2del 2 47105172,47114920,47205229,47277890,47413535,47426055 0,0,0,0,0,0 ABCDE AC'E 298615 PASS 47114920,47277890;47205229,47413535 0,0;0,0 HET 0.500 NA 0,7 13,36 2735.18 81.94 ABCDEFGHI/AF'I 4

What does the “.” Mean in the name inv.2del and invddup.inv? What is the “2” in inv.2del? How do you interpret those records and describe those events?

Thank you very much for your precious help.

Regards, Enzo Battistella

From: Joey Arthur @.> Date: Tuesday, April 23, 2024 at 2:40 PM To: SUwonglab/arcsv @.> Cc: ebattistella @.>, Author @.> Subject: Re: [SUwonglab/arcsv] Impossible to run example (Issue #10)

Ah, my apologies, you're right that filter-merge does not support VCF output. I'll note this as a possible future improvement. You should be able to accomplish the same using the individual .vcf files and bcftools concat and bcftools filter: https://samtools.github.io/bcftools/bcftools.html

— Reply to this email directly, view it on GitHubhttps://github.com/SUwonglab/arcsv/issues/10#issuecomment-2073168980, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCLLSHKY6VMOYXKRB7PXL3Y62TK7AVCNFSM6AAAAAAWZQXELOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGE3DQOJYGA. You are receiving this because you authored the thread.Message ID: @.***>

ebattistella commented 3 months ago

Dear Joey,

Thank you very much for your answer.

We are trying to better understand and interpret ArcSV output, Specifically, we are struggling to recover the source and target breakpoints for complex SV. Do you have a parsing script to process the arcSV output?

Best regards, Enzo Battistella

From: Battistella Enzo @.> Date: Monday, May 6, 2024 at 11:41 AM To: SUwonglab/arcsv @.>, SUwonglab/arcsv @.> Cc: Author @.> Subject: Re: [SUwonglab/arcsv] Impossible to run example (Issue #10) Thank you very much for your answer.

I have another question regarding the output format for two complex events:

chr1 45742292 45795471 chr1_45709126-45816915 BND,BND invddup.inv 2 45709125,45742292,45794533,45795471,45816915 0,1,1,1,0 ABCD AC'B'CD 53179 PASS 45742292,45795471;45742292,45794533 1,1;1,1 HOM 1.000 NA 20,33 0,78 1244.90 0.00 ABL'K'J'I'H'G'CDEFGHIJKLMN/ABL'K'J'I'H'G'F'E'D'C'GHIJKLMN 132

chr1 47114920 47413535 chr1_47105173-47426055 BND,BND inv.2del 2 47105172,47114920,47205229,47277890,47413535,47426055 0,0,0,0,0,0 ABCDE AC'E 298615 PASS 47114920,47277890;47205229,47413535 0,0;0,0 HET 0.500 NA 0,7 13,36 2735.18 81.94 ABCDEFGHI/AF'I 4

What does the “.” Mean in the name inv.2del and invddup.inv? What is the “2” in inv.2del? How do you interpret those records and describe those events?

Thank you very much for your precious help.

Regards, Enzo Battistella

From: Joey Arthur @.> Date: Tuesday, April 23, 2024 at 2:40 PM To: SUwonglab/arcsv @.> Cc: ebattistella @.>, Author @.> Subject: Re: [SUwonglab/arcsv] Impossible to run example (Issue #10)

Ah, my apologies, you're right that filter-merge does not support VCF output. I'll note this as a possible future improvement. You should be able to accomplish the same using the individual .vcf files and bcftools concat and bcftools filter: https://samtools.github.io/bcftools/bcftools.html

— Reply to this email directly, view it on GitHubhttps://github.com/SUwonglab/arcsv/issues/10#issuecomment-2073168980, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCLLSHKY6VMOYXKRB7PXL3Y62TK7AVCNFSM6AAAAAAWZQXELOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZTGE3DQOJYGA. You are receiving this because you authored the thread.Message ID: @.***>