BIMSBbioinfo / pigx_bsseq

bisulfite sequencing pipeline from fastq to methylation reports
https://bioinformatics.mdc-berlin.de/pigx/
GNU General Public License v3.0
9 stars 4 forks source link

Missing chromosomal position in mthyl call #166

Open devenderarora opened 3 years ago

devenderarora commented 3 years ago

Dear Sir, I have received a data after running pigz_bsseq pipeline. While analysing the methylation call I found some strange missing information like:

"67","1",20716,20716,"+",22,18,4 "68","1",20809,20809,"+",15,12,3 "69","1",20821,20821,"+",14,14,0 "70","1",20850,20850,"+",16,15,1 "71","1",20944,20944,"+",17,10,7 "72","1",21128,21128,"+",21,19,2 "73","1",2176,2176,"+",10,10,0 "74","1",22386,22386,"+",10,0,10 "75","1",22402,22402,"+",10,0,10 "76","1",22410,22410,"+",10,1,9 "77","1",2268,2268,"+",11,10,1 "78","1",22954,22954,"+",11,8,3 "79","1",23020,23020,"+",15,9,6 "80","1",23176,23176,"+",11,0,11 "81","1",23331,23331,"+",12,0,12 "82","1",23333,23333,"+",12,0,12

Here, at 77th: 2268 should be something 2268X and this is missing all over the chromosome. We cross checked at some position and found missing "0" or "00" at last position. May I know if we can fix this issue somehow. Or any suggestion. I will be grateful to you for your key inputs in this regards.

Devender Arora

alexg9010 commented 3 years ago

Hi @devenderarora,

Thank you for using the pigx-bsseq pipeline.

The format of the methylation call file that you are showing here should be: (Rownumber), Chromosome, Start Base, End Base, Strand, Total Read Coverage, Number of Cytosine at Base, Number of Thymine at Base

That being said, the 77th column indicates 11X at chr1:2268-2268.

I hope this makes things clearer for you.

Best, Alex

devenderarora commented 3 years ago

I think I am not able to present my query clearly. The data is easy to understand. But If you look at column 73 or 77th the start chromosome is 2176 or 2268 according to data output but if we look at the above positions we will see the start and end position is well in increasing order and after some digging, we found there should be 21760 instead.

alexg9010 commented 3 years ago

Ah, now I got your point. You are right, the files should be sorted by (first chromosome, but then ) position, so these two rows that you highlighted should be 21760 and 22680 respectively. I have the suspicion, that within R these numbers were represented with scientific notation, like 2176+e1 for 21760 or 2176+e2 for 217600, and when we exported to bed/txt format the notation was not respected.

@devenderarora Could you please tell me which version of the pipeline you are using? Also, I would need to know which exact file we are talking about, whether methylation has been called with methylKit or methylDackel so maybe you could paste the path of the file starting from your output folder.

Best, Alex

alexg9010 commented 3 years ago

Hi Devender,

The bed file you are looking at has the format as described here: https://www.genome.ucsc.edu/FAQ/FAQformat.html#format1 https://www.genome.ucsc.edu/FAQ/FAQformat.html#format1, so the header would be ‘chr : start : end : (name) : score : (strand) : start : end : color’ , so the 4th column would be the methylation difference score. However, the DMR BED file is mostly meant for visualisation in the genome browser (i.e. IGV), as windows are colored based on their score (differential methylation value) in divergent colors. To visualise the methylation difference, the methDiff values are centered first to [-1,1] and then mapped to [0,1] range, which makes direct interpretation difficult.

But in the ’09_differential_methylation' folder of your respective analysis you should have a tsv file which contains the results of the full analysis.

The methylation difference is calculated as the difference of the ‘treatment’ - ‘control’ group, so it depends on your setup, which dmrs are hyper or hypo methylated.

I hope this answered your question.

Best, Alex

On 24. Nov 2020, at 08:13, Devender Arora notifications@github.com wrote:

Dear Alex,

I am comparing different tissue and Differential methylation folder have bed files with data:

track name="differentially methylated cytosines " description="diff. meth. between alpha-F10-65D_1_val_1_bt2.sorted.deduped,alpha-F10-67D_1_val_1_bt2.sorted.deduped,alpha-F10-69D_1_val_1_bt2.sorted.deduped,alpha-F10-71D_1_val_1_bt2.sorted.deduped,alpha-F10-72D_1_val_1_bt2.sorted.deduped,beta-F10-65-D_1_val_1_bt2.sorted.deduped,beta-F10-67-D_1_val_1_bt2.sorted.deduped,beta-F10-69-D_1_val_1_bt2.sorted.deduped,beta-F10-71-D_1_val_1_bt2.sorted.deduped,beta-F10-72-D_1_val_1_bt2.sorted.deduped mapped to sus11" itemRgb=on 1 162351 162352 . 0.697058159756556 . 162351 162352 0,193,0 1 169051 169052 . 0.364488088788948 . 169051 169052 51,237,51 1 196833 196834 . 0.73502295898567 . 196833 196834 0,182,0 1 196857 196858 . 0.775725540080495 . 196857 196858 0,169,0 . . I can understand the header must be chr: start base: end base: (4th column): start: end: coverage: Cs:Ts I need your view on what this 4th column score signify? Is this DMR score for particular region? And is this score presenting methylation in alpha tissue or beta? Please share your opinion I will be grateful to you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BIMSBbioinfo/pigx_bsseq/issues/166#issuecomment-732702825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD5ECOQW2DINRNQV6ITSRNMKDANCNFSM4TOXCFVA.

devenderarora commented 3 years ago

Dear Alex, Thankyou for your response. We used pigx_bsseq-0.0.10 version for the analysis. The file folder is 06_methyl_calls/ and file name ends with CpG.txt and methylRaw.RDS.