bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
108 stars 21 forks source link

Clarification about filter_samples_by_positions.py #45

Closed pterzian closed 4 years ago

pterzian commented 4 years ago

Hi Peng,

So I am ready to train a model using samples extracted and labelled from bisulfite high confidence sites. I figured that the filter_samples_by_positions.py script was intented for this purpose. However some basic clarifications would be great :

So I should build 4 lists of positions, two for the forward strand (methylated and unmethylated) and two for the reverse strand, then combine the outputs and shuffle it ?

Best,

Paul

PengNi commented 4 years ago

Hi Paul,

  1. Yes, there should two position files, one for methylated samples, one for unmethylated samples.

  2. there is no need to run twice for forward and reverse strand. The formart is as: chrom\tpos_in_forward_strand\tstrand, like:chr1\t2\t+ or chr1\t3\t-

So 2 position files should be build, one for methylated samples and one for unmethylated samples.

Best, Peng