BIMSBbioinfo / ikarus

Identifying tumor cells at the single-cell level using machine learning
MIT License
45 stars 12 forks source link

adjust sensitivity of prediction #14

Closed amisios closed 2 years ago

amisios commented 2 years ago

Hello,

I have been applying Ikarus to a few Pancreatic ductal adenocarcinoma samples. What I find is that in some samples, there is "over prediction" or "under prediction" for the core prediction. But what I also saw is that the final prediction (cell-cell network label propagation) was "overdoing it". Are there parameters to calibrate (or fine-tune) the predictions of either or both steps? Is this possible? The first step is following logistic regression to make core_pred; The second step is in the cell-cell network label propagation to make final_pred.

For example: image Here ikarus predicts correctly that there are cancer cells in "Ductal cell type 2" but those are not enough to make it through the cell-cell network propagation (there are no malignant cells in the final_pred). I am looking for a way to boost the tumor predictions on the first step, or the second step. The ultimate goal is to have tumor cells in the final_pred.

In case I have something wrong here please do correct me. This is based on how I understand the method works.

dohmjan commented 2 years ago

Hi, thank you for trying out ikarus!

Both, when initializing the model, you could try to modify the number of neighboring cells taken into account for creating the cell-cell network (n_neighbors) and/or you could try to modify the what we call certainty threshold for the label propagation step (certainty_threshold). As a first guess, I would start with decreasing the latter. But that's just a guess, I think that would depend on the distribution of differences of scoring values (tumor-normal).

These parameters might influence the label propagation step and with that the final_pred (not the core_pred). From what you wrote, I think your understanding for final_pred and core_pred is correct.

model = classifier.Ikarus(signatures_gmt=signatures_path, out_dir="out", n_neighbors=100, certainty_threshold=0.9) # default values for n_neighbors and certainty_threshold
amisios commented 2 years ago

Thanks for the response. This is very useful.

Are there other parameters with which I can adjust/fine-tune the core prediction?

Thanks. A.

dohmjan commented 2 years ago

No, as of now, there are no other parameters specifically for tuning the core prediction. Of course, basically, the label propagation step is an adjustment/correction to the core prediction.

Apart from that, the core prediction depends on the scoring algorithm, the core model and of course the gene lists. Currently, just AUCell (scorer) and LogisticRegression (core model) are supported. Though, it would be interesting for the future to have different methods included, scorers in particular.