epi2me-labs / wf-human-variation

Other
87 stars 41 forks source link

modkit #97

Closed carolinehey closed 8 months ago

carolinehey commented 9 months ago

Ask away!

I have a BAM file that contains both 5hmC and 5mC calls, but I'm only interested in generating a modkit/workflow BED file with the 5mC calls. I can't find any information on the arguments I can use to achieve this. Any help would be appreciated.

RenzoTale88 commented 9 months ago

@carolinehey do you want to group together 5mC and 5hmC, or just get 5mC and drop completely the 5hmC? The two running mode will require different options:

  1. For the former, you can use --modkit_args '--combine-mods'; if you also want to combine the strands, you need to do --modkit_args '--combine-mods --combine-strands '. You'll see that the resulting bed will show a C entry in column 4 of the output bed file.
  2. For the latter (ignore 5hmC, and retrieve only 5mC) you can try with the --preset traditional mode with --modkit_args '--preset traditional ' (which is the same as running --modkit_args '--cpg --ignore h --combine-strands '). This will only report c values in column 4 of the output bed file.
  3. Alternatively to point 2, you can simply filter out all entries in the resulting bed file showing an h in column 4.

In general, there isn't one way to run the analysis. You can find more details on how to run modkit in the modkit documentation here, and on the options of modkit pileup here.