bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
108 stars 21 forks source link

Extracting methy_label from fast5 files #15

Closed ardakdemir closed 5 years ago

ardakdemir commented 5 years ago

How does the extract_features function get the true label for each candidate target base? I have been going over the code but could not figure out how it determines whether a candidate target is methylated or not methylated?

get_refloc_of_methysite_in_motif function seems to return all the candidate targets and then I could not figure out how you get the label for each targeted base (input).

PengNi commented 5 years ago

Hi @ardakdemir ,

So

  1. If the reads are sequenced from a native DNA sample, we can't determine the true label of the targets in extract_features module. The --methy_label is kind of a preset thing in this case.

  2. If you get some high-confidence (methylated or unmethylated) positions (e.g., from bisulfite-sequencing), filter_samples_by_positions may be useful to extract the samples of the methylated or unmethylated positions after you use the extract_features module. However, filter_samples_by_positions can only extract either methylated or unmethylated positions at a time (--label is used to change the preset labels of targeted bases if necessary).

  3. In the develop branch, we also added --positions option in extract_features module to extract only the samples of the interested positions.

Best, Peng

ardakdemir commented 5 years ago

Thank you for pointing this out and for the fast reply!

So in its default mode extract_features just finds all the motif sequences in a given read. So that all motif sequences in a read is labeled either as methylated or unmethylated, is that correct?

PengNi commented 5 years ago

@ardakdemir , yes, that's correct. extract_features will extract all the targeted motifs in the given data and label them by the same label.

ardakdemir commented 5 years ago

Thanks a lot for the clarification!