Open starsyi opened 1 year ago
The nanopolish output that is currently supported is a TSV file. If you can format your TSV file to contain the required columns (see below) you can import it as if it were a nanopolish output:
chromosome start end read_name log_lik_ratio
chr1 30012312 30012312 aksdlaksdlas -4.542
The order of the columns does not matter either, as long as you have a single header line with these column names. Personally I don't have the capacity right now to implement explicit conversion commands for modkit or other tools, but I'll leave the issue open and will be happy to accept pull requests that come with test data.
Nanopolish calls 5mCs with a log-likelihood ratio and set up a specific cutoff for methylation calling, but other tools like DeepSignal or Guppy predict a methylation calling probablity for each site instead, and these 2 values can't be converted as far as I know. How to solve the issue?
In thse case, is the log_lik_ratio
conversion column necessary for the conversion? Does the column support methylation probablity? How does it contribute to the meth5 conversion? Thanks!
The column is required you'll need to convert from methylation probability (range 0-1) to log likelihood ratio (range negative infinity - positive infinity).
Assuming an uninformative prior, use the logit function to convert:
log_lik_ratio = logit(p) = ln(p/(1-p))
Is it also possible to support importing other formats, such as bed and tsv result files that output methylation modification information from modkit?
Since nanopolish does not support the latest R10.4 chemistry method and dorado/remora is now the standard method for obtaining nanopore methylation calls, it would be great to be able to use meth5 and pycometh with modbams generated by remora.