Closed jennieli421 closed 1 year ago
Hi @jennieli421,
I think that would be
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
Does that make sense?
Oh, I meant to ask how I should writ the line of targets =
.
Should I write targets = []
?
In that case, I got
2023-06-27 14:59:45,694 ru.ru_gen Region 'select_target_genes' (control=False) has 0 contigs of which 0 are in the reference. There are 0 targets (including +/- strand) representing 0.0% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
which seemed like no read can be mapped to the reference.
Oh I see! What you should do is either list all the chromosome names in the reference like so
targets = ["chr1", "chr2"...]
Or you can leave the targets empty,
targets = []
And reverse the toml
single_on = "unblock"
multi_on = "unblock"
single_off = "stop_receiving"
multi_off = "stop_receiving"
min_chunks = 0
max_chunks = 2
no_map = "proceed"
no_seq = "proceed"
This way anything that maps to the reference is single_off
or multi_off
, anything that doesn't map after 2 chunks is unblocked. You may want to change no_map
to unblock, which means that anything that doesn't map is unblocked immediately rather than left to acquire another chunk.
Hope this helps!
Hello! This issue/discussion is helpful to me, thank you both. I hope this has resolved jennieli421's issue. But since this is still open, I have a very similar use case - I want to deplete an entire genome. But I'm unclear, I have a question from your above TOML file - can you clarify the "stop-receiving" setting?
From your Nature paper I get this definition: "stop receiving data for the remainder of that read". This makes me think that this read is also rejected, similar to unblocking. But in the above TOML file with empty targets, it seems like you are saying that "stop_receiving" allows the read to be sequenced?
If so, what is the difference between "stop_receiving" and "proceed"?
Proceed means that one collects more data for an individual read and you assess it again. Stop_receiving tells the sequencer to keep sequencing that read and not evaluate it again.
So - unblock means a read will be rejected from the pore and a new one sampled. proceed means the read will continue to sequence and the next batch of singal will be analysed again. stop_receiving means send no more data about this read and let it sequence to normal completion.
I hope that helps!
Totally. You guys are great, btw. This is amazing software.
Here is another related question. In the above example for enriching a genome, with no listed targets you give these settings:
targets = [] single_on = "unblock" multi_on = "unblock" single_off = "stop_receiving" multi_off = "stop_receiving" min_chunks = 0 max_chunks = 2 no_map = "proceed" no_seq = "proceed"
You mention above that any read that is categorized as single_off or multi_off is a read that maps to the genome. But what then are the "single_on" and "multi-on" reads? These would normally be reads that map to the targets, but there are no targets. Do these settings become irrelevant when the targets = []?
Sorry, the documentation isn't very clear on this! I'll update it
A break down:
single_on
- Basecalled read signal fragment produced a single mapping, which does start within a target regionmulti_on
- Basecalled read signal fragment produced multiple mappings, of which At least one starts within a target regionsingle_off
- - Basecalled read signal fragment produced a single mapping, which does not start within a target regionmulti_off
- Basecalled read signal fragment produced multiple mappings, of which None start within a target regionno_map
- Read signal fragment did not align to the referenceno_seq
- Read signal fragment did not base callSo in this TOML
single_on = "unblock"
multi_on = "unblock"
single_off = "stop_receiving"
multi_off = "stop_receiving"
min_chunks = 0
max_chunks = 2
no_map = "proceed"
no_seq = "proceed"
single_on
and multi_on
are impossible, because there are no targets.
single_off
and multi_off
are the conditions for any reads that align to the reference, as the whole reference is technically off target, as there are no targets
no_map
covers any reads that do not map to the reference
So any read which does not align in 2 chunks is unblocked (max_chunks =2),
and read which aligns to the reference is sent a stop_receiving
, and sequenced
Does that make sense?
Perfect, thanks!
Cool gonna close this for now. Feel free to reopen if you need more help
If I would like to enrich for all reads that are mapped to a reference, what should I put as
targets
?