directly replace the feature with 0 when training mutant models

William-chen777 commented 1 month ago

Dear Professor, hello. I'm curious whether it's appropriate to directly replace the feature with 0 when training mutant models. Could you provide a reference for this practice?

Satoyo08 commented 1 month ago

Hello William,

Thank you for your inquiry! This is Satoyo Oya replying (postdoc, not a professor).

To clarify, the mutant model was not trained on features replaced with 0. Instead, to simulate ATX1/2 localization in the absence of H2Bub (∆hub mutant), a feature matrix where all H2Bub data was replaced with 0 was given to the already trained random forest models to make new predictions, as described in the method section of Oya et al., 2022.

If you are wondering whether it is appropriate to use ZERO as a replacement, here is the background information: Since it is well-documented that H2Bub is depleted in the ∆hub mutant, it is appropriate to use the minimal value for H2Bub in the simulated ∆hub mutant feature matrix. Because random forest is a non-parametric model, there is no need to normalize the values in the feature matrix. Accordingly, the feature matrix we used consists of RPM/RPKM values or raw read counts, so the minimum value is 0.

If the validity of the strategy to "train a random forest on the WT data set, then let it predict on a mutant features" is still unclear, here is our reasoning:

Imagining the molecular mechanisms, chromatin writers like ATX are likely recruited to chromatin by multiple chromatin reader domains or reader proteins. The idea of recruitment by a combination of factors is consistent with many previous studies on chromatin writers. ↓ Therefore, ATX recruitment can be approximately modeled as a decision tree based on chromatin features. (For example, if modification X is present, ATX is more likely to bind through reader A; if modification Y is also present, reader B enhances ATX binding; but if modification Z is present, reader C repels ATX binding…) ↓ By training a random forest on WT chromatin features and WT ATX localization, we may be able to extract the decision trees used by ATX proteins to determine their targets. This approach revealed that ATX1/2 targeting gives high importance to H2Bub. But does ATX1/2 targeting truly depend on H2Bub? ↓ Based on the idea that ATX1/2 targeting is determined by combinatorial features, it is unlikely that the absence of H2Bub would result in the loss of all ATX localization. How can we predict where ATX localization is lost and where it remains? We assumed that providing the ATX1/2’s decision trees with a feature matrix lacking H2Bub value would do; the predicted localization will be lost at some specific targets where a decision tree heavily depends on the presence of H2Bub (=in silico ∆hub mutant). ↓ To experimentally verify this, we examined ATX1/2 localization in a hub mutant (in vivo ∆hub mutant). We found a general agreement between the in vitro ∆hub and the in silico hub (Supplementary Figure 7abc). ↓ To summarize, although I didn’t find a prior study that used a similar strategy, our paper demonstrated that this in silico approach can appropriately model the actual events that happen in a mutant plant to some extent.

I'm happy to answer any further questions, and I'd be delighted to have your suggestions/ideas.

-Satoyo

William-chen777 commented 1 month ago

Thanks for your kindly reply, it helps me a lot!

Satoyo08 commented 1 month ago

You are welcome!

Satoyo08 / Arabidopsis_H3K4me1

directly replace the feature with 0 when training mutant models #1