dissatisfaction-ai / scHierarchy

A toolking for cell type hierarchies: marker selection & consistent classification
Apache License 2.0
20 stars 1 forks source link

Deal with overfitting #8

Open tulerpetontidae opened 2 years ago

tulerpetontidae commented 2 years ago

Regularisation with:

  1. Just grid-search the best sigma parameter
  2. Learn Laplace sigma parameter
  3. Learn Laplace sigma for each gene
  4. Learn Laplace sigma as a product of gene and cell type specific parameters

Transformations:

  1. z-score after log1p(x / sum )
  2. x / std
  3. pseudo z-score ?

Compare on train - validation split each of this cases

tulerpetontidae commented 2 years ago

To speed up resolution of this issue:

  1. Subsample thymus dataset to 200 cells per class
  2. Use softplus transformation with positive weights (gamma or exponential/halfLaplace). Implement as a separate model.
  3. Find rescaling for Laplace parameters depending on the number of cells. Should be a primary source of overfitting, in comparison with a standard model
tulerpetontidae commented 2 years ago

Extra ideas:

  1. Make additional likelihoods on layer prediction without the use of the parent nodes, potentially with bias term on the auxiliary likelihoods
  2. Initialise with the proportion of cells that expresses corresponding genes per cluster
  3. For the positive value version of the model, make weights of the lower level as a sum of independent part and a parent (w_l2 = w_p + w_ch)