Closed jolespin closed 2 years ago
Hi @jolespin,
1.) Ideally I would like to implement a test that takes replicates into account to model the biological/technical variability, but at the moment it is not the case. So I guess you should probably group you replicates together provided that you have a relatively homogeneous coverage. As for the time groups, I would run all 3 groups separately. From the output of Meth_comp you can find out how conditions were grouped based on the labels
and med_llr_list
columns which are always in the same order.
2.) If nothing is provided it leave the name field in the bedgraph file blank. It's definitely better to pass it.
3.) It is strictly non_overlapping. Except if it doesn't make sense I would recommend to use CpG Island annotations generated by CGI_Finder or providing external CpG annotations.
4.) Yes, I think I more or less explained in question 1
Thank you for the helpful response!
I will definitely group each timepoint separately (t=0, t=1, t=2).
Just so I understand, if I did this approach I would do the following:
CpG_Aggregate
for all samples of t=0
, then repeat for t=1
, and so on for t=2
using the t=x
as the -s SAMPLE_ID
on each of those runsCpG_Aggregate
output files, I would then run Interval_Aggregate
using the -s SAMPLE_ID
identifiers as I did in Step 1Meth_Comp
using all 3 files generated from Interval_Aggregate
Meth_Comp
output, I would be able to identify methylation events that are in the following conditions:
4a. both t=1 and t=2 but absent in t=0
4b. Exclusively in t=0For example, if I use t=1
, t=2
, and t=3
for the -s SAMPLE_ID
of each group, would the output be something along the lines of the following (note, I copy/pasted from the docs and changed the labels):
chromosome start end n_samples pvalue adj_pvalue neg_med pos_med ambiguous_med labels med_llr_list raw_llr_list comment
I 118799 118800 4 0.29978058859571194 0.4279805197751711 3 1 0 ["t=1","t=2"] [2.005,-2.645,-6.64,-5.51] [[-1.82,5.83],[-0.09,-5.2],[-2.67,-10.61],[-6.7...Non-significant pvalue
I 141415 141416 4 0.11800901597381579 0.3742064977444858 1 1 2 ["t=0"]
I 151819 151820 4 0.1315327225513005 0.3742064977444858 1 1 2 ["t=0", "t=1"] [-2.98,-1.95,2.93,-1.48] [[-3.68,-4.4,0.54,-2.28],[-3.45,-1.95,-1.21],[4...Non-significant pvalue
Where the first line would be condition 4a
and the second would be condition 4b
? The third line wouldn't be anything (this is ignoring p/adj_pvalue)
Is this the appropriate usage? I'm working on a pipeline for the institute so we will be running this quite a bit. I want to make sure I understand what I'm running.
Thanks again!
Data Description:
t=0
; No treatmentt=1
; After treatmentt=2
; After treatment In total, I have 12 specimens * 3 samples per specimen = 36 samplesI have
nanopolish call-methylation
output for each of the samples.My goal is to identify methylation events that are present in
t=1
andt=2
but absent fromt=0
. In other words, methylation events that are potentially caused by a treatment. I'm also interested in the opposite, methylation events that are exclusive tot=0
.I've already used
nanopolish
on each of the samples and I'm wondering the best way to utilizepycoMeth
for this analysis.My plan was to use the following
pycoMeth
pipeline:↓
↓
↓
My questions:
For the
NANOPOLISH_FN
argument inCpG_Aggregate
, do you recommend that I group all oft=1
andt=2
together for all specimens if I am interested in methylation events that are persistent? Or do you recommend that I treatt=1
andt=2
separately based on the type of output that results fromCpG_Aggregate
?How does
SAMPLE_ID
work? If nothing is provided for this argument, how are records named in thebed
file?For the sliding window in
Interval_Aggregate
, is there overlap in the window or is it getting mutually exclusive segments?Let's say I use
Meth_Comp
with 2 files:t=0
andt > 0
(i.e.t=1
&t=2
). Is there a way to know which methylation events (or intervals) are more prominent in either condition or is it more showing just that there is a difference?Thank you in advance. Looking forward to finally using your tool! It's exactly what I've been looking for.