Closed pterzian closed 4 years ago
Hi Paul,
Yes, there should two position files, one for methylated samples, one for unmethylated samples.
there is no need to run twice for forward and reverse strand. The formart is as:
chrom\tpos_in_forward_strand\tstrand
, like:chr1\t2\t+
or chr1\t3\t-
So 2 position files should be build, one for methylated samples and one for unmethylated samples.
Best, Peng
Hi Peng,
So I am ready to train a model using samples extracted and labelled from bisulfite high confidence sites. I figured that the
filter_samples_by_positions.py
script was intented for this purpose. However some basic clarifications would be great :Because there is a
--label
argument, could you confirme I should use the script twice, one to extract and label methylated samples and one for unmethylated samples ?I see the position file should include only the chromosome and the genomic position (
chromosome\tpos_in_forward_strand
). Does it mean I also have to run this script twice to extract sample matching forward and reverse strand.So I should build 4 lists of positions, two for the forward strand (methylated and unmethylated) and two for the reverse strand, then combine the outputs and shuffle it ?
Best,
Paul