cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

Error in get_abs_bkpts_from_clipped_reads.pl #66

Closed hfl112 closed 5 years ago

hfl112 commented 6 years ago

I've done BRASS sucessfully using several samples, but recently encounter an error at filter.abs_bkp step.

informations in Sanger_CGP_Brass_Implement_filter.abs_bkp.err:

No high_end reads for record 255731
No high_end reads for record 255751
Resolving overlapping rg end ranges...
Could not properly separate the reads between the low ends of rearrangements 50230 and 50231!
Use of uninitialized value $clip_pos in numeric ge (>=) at /mnt/data/hfn/Tools/CGP/BRASS-dev/perl/bin/get_abs_bkpts_from_clipped_reads.pl line 469.
1148718.56user 66873.16system 348:59:13elapsed 96%CPU (0avgtext+0avgdata 16487284maxresident)k
1615627960inputs+11344outputs (17829major+37930363131minor)pagefaults 0swaps

Any suggestions would be appreciated Thanks

BastienNguyen commented 6 years ago

Hi, I have the same issue here. Is someone could please tell me what this step is for? The filter.abs_bkp step takes a very long time and It seems that the .filtered.bedpe file is generated before. Can I use the .filtered.bedpe as it is?

Many thanks,

Bastien

keiranmraine commented 6 years ago

@yilong-li , can you suggest the correct way to handle groups where this occurs? Should we be permissive and let them through or filter them?

yl3 commented 6 years ago

@BastienNguyen this step is for estimating absolute breakpoints for each breakpoint of each read group. It's needed for some subsequent filtering steps in the Brass pipeline.

@hfl112 without seeing the data it's a bit hard to say. But I suspect the script is crashing from the fact that there are many breakpoints clustered together around the low ends of rearrangements 50230 and 50231. These clusters typically indicate "artefactual" breakpoints, or sometimes clusters of breakpoints from inserted L1 master copies. The likelihood that these SVs are real is low, and even if they are true, it'll be hard to interpret their structure or mechanistic origin correctly. I would remove the offending SV from the input and try again...

keiranmraine commented 6 years ago

@yilong-li , do you think it makes sense for the code to be modified to step over events that trigger this error? What would that look like, absence from the file generated by this step, or a "dummy" record?

It would be nice to handle this gracefully for users rather than have to request manual intervention.

... I don't expect you to do the work, but I need some guidance.

yl3 commented 6 years ago

@keiranmraine can you try changing all the following lines (there are eight instances, unfortunately)

$clip_pos = $median_pos;

to

$clip_pos = $median_pos if $median_pos;

On a related note, this code is ripe for some serious refactoring...

sb43 commented 5 years ago

@keiranmraine Added condition as suggested by @yilong-li