PopicLab / cue

Deep learning framework for SV calling and genotyping
MIT License
102 stars 20 forks source link

LINKED mode of cue #13

Closed KeqingWangg closed 1 year ago

KeqingWangg commented 1 year ago

Hi!

Thank you for developing and sharing this very interesting sv detection method.

I am personally very interested in the Cue-LINKED you have mentioned in your paper. I am wondering if it is possible for you to share the scripts as you mentioned in the last part of notebooks/extensions.ipynb? Or I might misunderstand and you have already included this part in Cue? If so, could you please provide some more information about how to run Cue under the LINKED mode?

Looking forward to your response! Many thanks!

Keqing

viq854 commented 1 year ago

Hi Keqing,

Thank you for your interest in the Cue framework.

To extend to linked reads (as for our proof-of-concept evaluations described in the “Extending Cue” section of the paper and the notebook) — we only added one new channel to each image to capture the additional barcode information available with linked reads (and reused most channels from short reads). For example, you can use this SV channel set from constants.py (switching to “LINKED” instead of “SHORT” in the config):

SV_SIGNAL_SET.LINKED: [SVSignals.SM, SVSignals.RD_LOW, SVSignals.SR_RP, SVSignals.LLRR, SVSignals.RL] 

The first channel, SVSignals.SM, is the split-molecule/barcode signal type. The framework provides functions to compute index bin lookups and intersections, which can be directly used for barcode intersection as well. As described in the notebook, only a few extra lines of code need to be added to the BAM indexing stage in aln_index.py to collect the barcodes (no other scripts should be needed):

if signal == SVSignals.SM :
    barcode = read.get_tag('BX')
    self.bins[signal][bin_id].add(barcode)

I can send the function over email as well if that’s helpful and help you get it running.

KeqingWangg commented 1 year ago

It works now! Thank you for the guidance!

pontushojer commented 1 year ago

@viq854 Would you please provide the full instructions for running Cue on linked reads.

  1. This statement:

you can use this SV channel set from constants.py (switching to “LINKED” instead of “SHORT” in the config)

Does this refer to modifying the img/constants.py file. If so it is not too convenient for the user. From looking at the constants.py script it seems to be possible to update the the data YAML to include the configuration signal_set: "LINKED" to achieve the same outcomes. Could you confirm this is the case?

  1. The snipped you provided i.e.
if signal == SVSignals.SM :
    barcode = read.get_tag('BX')
    self.bins[signal][bin_id].add(barcode)

Should this be added att the end of the add_by_signal function in seq/aln_index.py. If that is the case could this not be included in the base repo?