PacificBiosciences / pb-CpG-tools

Collection of tools for the analysis of CpG data
BSD 3-Clause Clear License
70 stars 6 forks source link

question for output #22

Closed hmyh1202 closed 2 years ago

hmyh1202 commented 2 years ago

Hello:

For -p count, four additional columns are present:

modified site count
unmodified site count
avg mod score
avg unmod score

Dose "The modified site count" is methylated(methylation level>50%) CpG number in that position for all HiFi reads aligned?

How I caculate the methylation level of for a CpG site and for a special genomic region ? could I just use modified site count/(modified site count+unmodified site count) ?

The best !

Thank you !

dportik commented 2 years ago

@hmyh1202 we strongly recommend using the -p model mode for obtaining methylation levels, as it is more accurate.

The output bed files for both modes share the first six columns:

  1. reference name
  2. start coordinate
  3. end coordinate
  4. modification probability
  5. haplotype
  6. coverage

If you want modification probability at a given site, you should be using column 4. This is true regardless of which pileup mode option you select.

hmyh1202 commented 2 years ago

Thank you ! I will use -p model re-run my PB dataset.

So, the column 4 of modification probability be equal to the methylation level ?

If I want to caculate the methylation level value for a special gene, if the mean modification probability of all CpGs for that gene ?

Thank you!

dportik commented 2 years ago

@hmyh1202

So, the column 4 of modification probability be equal to the methylation level ?

Yes.

If I want to caculate the methylation level value for a special gene, if the mean modification probability of all CpGs for that gene ?

There are many approaches to determining the methylation status of a CpG island or gene region. We do not currently have a recommendation for this, but your approach seems reasonable.

hmyh1202 commented 2 years ago

When I use parameter of -c 1 for 1x coverage output, the tool only 4x coverage CpGs?

dportik commented 2 years ago

Yes 4x is the minimum in order to run the model for a given site. You can use count mode for 1x coverage, but in general 1x coverage is not informative for analysis.