WGLab / LinkedSV

MIT License
20 stars 8 forks source link

Error with clustering #2

Closed msvaton closed 5 years ago

msvaton commented 6 years ago

Hello, thank you for the development of LinkedSV. Trying out your software, I unfortunately came across this error: [11/12/2018 18:48:39 (56.603 MB)] finished extracting weird reads [11/12/2018 18:48:39 (56.603 MB)] clustering reads [11/12/2018 18:48:39 (56.603 MB)] first round clustering reads, length cut is 200000, output file is: <censored>bam.tmpbcd22 [11/12/2018 18:48:39 (56.603 MB)] calculating fragment parameters from file: <censored>.bam.tmpbcd22 [11/12/2018 18:48:39 (56.603 MB)] N95_fragment_length is: -1 Traceback (most recent call last): File "linkedsv.py", line 183, in <module> main() File "linkedsv.py", line 27, in main detect_increased_fragment_ends(args, dbo_args, endpoint_args) File "linkedsv.py", line 132, in detect_increased_fragment_ends cluster_reads(args, dbo_args, endpoint_args) File "/LinkedSV/cluster_reads.py", line 62, in cluster_reads global_distribution.estimate_global_distribution (args, dbo_args, endpoint_args, endpoint_args.tmpbcd22_file, is_fast_mode = True) File "/LinkedSV/global_distribution.py", line 100, in estimate_global_distribution get_fragment_parameter(args, dbo_args, endpoint_args, global_dist_fp, target_bcd22_file) File "/LinkedSV/global_distribution.py", line 179, in get_fragment_parameter args.fragment_length_lmda = fit_geometric_distribution(frm_length_list, readpair = False) File "/LinkedSV/global_distribution.py", line 139, in fit_geometric_distribution k = np.percentile(length_list, cdf2) File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3540, in percentile a, q, axis, out, overwrite_input, interpolation, keepdims) File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3652, in _quantile_unchecked interpolation=interpolation) File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3250, in _ureduce r = func(a, **kwargs) File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3767, in _quantile_ureduce_func x1 = take(ap, indices_below, axis=axis) * weights_below File "/usr/local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 181, in take return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) File "/usr/local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 51, in _wrapfunc return getattr(obj, method)(*args, **kwds) IndexError: cannot do a non-empty take from an empty axes.

The file with weird reads has approx 45k lines: chr9 140441208 chr9 140441498 3p_end 3p_end 60 PE NB501049:111:HTGTHBGX3:1:11101:1640:7260@65@chr9@140441133@76M NB501049:111:HTGTHBGX3:1:11101:1640:7260@129@chr9@140441451@48M26S However the tmpbcd22 is empty.

Could you please help me, where could the problem during clustering be? What more information could I provide and is the clustering dependent on external software, with which i might have a problem?

I cannot seem to identify, where the problem could be.

Thank you very much and best of luck.

Michael

fangli80 commented 6 years ago

Hi Michael, Thank you for using LinkedSV! It seems that there were some errors during the generation of the tmpbcd22 file. This file should not be empty. Could you please upload the full stderr file?

Best, Li

msvaton commented 6 years ago

Dear Li,

thank you for your quick reply and your help. Here is the log, I just replaced the path to input bamfile with "input.bam". output.log

Thank you for your help and have a nice day!

Best wishes, Michael

fangli80 commented 6 years ago

Hi Michael, Could you please check if the following files exist: input.sortbx.bam input.sortn.bam input.sortn.bam.coreinfo input.sortn.bam.weired_reads input.bcd21

msvaton commented 6 years ago

Hello Li, this is the ls output of the dir.

total 9983084
-rw-r--r-- 1 user user        554 Nov 12 20:25  input.bam.barcode_statistics
-rw-r--r-- 1 user user        116 Nov 12 20:25  input.bam.bcd21
-rw-r--r-- 1 user user        113 Nov 12 20:36  input.bam.fragment_statistics
-rw-r--r-- 1 user user 1985609935 Nov 12 20:25  input.bam.sortbx.bam
-rw-r--r-- 1 user user 2284139674 Nov 12 20:29  input.bam.sortn.bam
-rw-r--r-- 1 user user 5944518822 Nov 12 20:31  input.bam.sortn.bam.coreinfo
-rw-r--r-- 1 user user    8369182 Nov 12 20:36  input.bam.sortn.bam.weired_reads
-rw-r--r-- 1 user user        240 Nov 12 20:36  input.bam.tmpbcd22
fangli80 commented 6 years ago

The bcd21 file is empty, which indicates there may be something wrong with the input bam file. How was the input bam file generated? LinkedSV requires the phased_possorted_bam from Longranger.

msvaton commented 6 years ago

Ok, that will be the cause, error processing the bamfile. I am sorry to take up your time and will look into it again. Thank you very much for help and good luck

Best wishes, Michael