Open mkpython3 opened 1 year ago
Hi @mkpython3
MethylScore was designed to target larger DMRs. It could well be that your rather small simulated DMRs fall below the thresholds, or that they for that reason fail the statistical test. Could you try either to 1) enlarge the simulated DMRs, or 2) to adjust some parameters, e.g. some default values seem not suitable for your experiment:
However, I bet that enlarging the simulated DMRs is more promising. To assess that, actually you could check if your MR files show a single MR in the visualized example above, i.e. if you find MR breakpoints at the simulated DMR breakpoints. If not, the MRs likely span the whole region and the 5 variable Cs are not enough to be detected.
Let us know if that helps.
Hi,
thank you for your suggestions. Sadly adjusting the parameters did not help finding any of the small DMRs. Therefore I increased the size of the simulated DMRs to 50 CG sites. Now I am able to find my DMRs with a F1-Score of around 0.7. However I have noticed that it only works reliably when the methylation background is low (beta distribution a=1, b=10) and the methylation level in DMRs is high (beta dist a=10, b=1). When the background is high and the DMRs are low the F1 Score decreases to 0.5 or lower, in some cases 0. This does not seem to happen in the CHG and CHH context, at least not from my testing. Now I am a bit worried because in Barley, the organism I work with, the average CG methylation level is quite high (almost 90%). Is it possible that this observation is linked to the design of Methylscore? Is the computation method generally different for the contexts? Do you have any recommendations for adjusting the parameters again?
Btw I use this formula for the F1 Score: (2 TP) / (2 TP + FP + FN)
Best Regards and thank you in advance Marius
Hi, I wanted to use MethylScore in my analysis for finding DMRs between ~20 different Genotypes. Therefore I first wanted to test if MethylScore is able to find simulated DMRs. For this I used metilenes background simulation script to simulate 20 samples of A. thaliana chromosome 1 drawing methylation levels from a beta distribution with parameters a=10, b=1 and then implementing DMRs with a custom script changing the methylation levels for 5 consecutive CG sites to values drawn from a beta distribution with inverted parameters a=1, b=10 in 50% of the samples. With this method I implemented 100 DMRs in the CG context. This is of course a very simple and non perfect way of simulating DMRs but this should suffice for the intend of just getting the pipeline up and running.
However I was not able to find any DMRs with MethylScore. I have tested to call DMRs with HOME alternatively, which found almost all of them with correct boundaries.
Interestingly, if I simulate the samples with a low methylation background (using beta distribution parameters a=1, b=10) and DMRs with a high level of methylation in 50% of the samples (beta distribution parameters a=10, b=1) the last step of the pipeline (DMRS:MERGE_DMRS) is skipped without visible error:
I am using the following command to run MethylScore:
nextflow run Computomics/MethylScore --BEDGRAPH --SAMPLE_SHEET=/scratch/mariusk/methylscore_test/samplesheet.tsv --GENOME=/scratch/mariusk/methylscore_test/Arabidopsis_thaliana.TAIR10.dna.chromosome.1.fa -profile docker --DMR_CONTEXTS CG
Im sadly unable to attach the input files directly to this issue due to size limitations, however I have uploaded the first case of a high methylation background here: https://drive.google.com/file/d/1_LKNfjrVI5ylzXdBqGid6QrVquSxY_WT/view?usp=sharing Introduced DMRs in this dataset are at the following positions:
Here is a visual example of a simulated DMR in the uploaded data: The X axis represents the genomic coordinates, the methylation levels of the 20 samples is on the Y axis. One DMR is shown with a red background as an example, the other DMRs are out of the X axis boundaries in this exerpt.
I have given each sample its own ID in the samplesheet.tsv. The .nextflow.log files for the run of the high methylation level background where no DMRs were found and the run with the low background where the last step is skipped are attached to this issue.
Am I missing something obvious? I have tried playing with the MR and DMR parameters a bit but had no luck. If you need anything else from me please just let me know. I would be very grateful if you could help me out. Thanks in advance.
nextflow_high_bg.log nextflow_low_bg.log