calico / scBasset

Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks.
Apache License 2.0
89 stars 12 forks source link

why multi_label=True? #24

Open Miaoyuanyuan777 opened 2 months ago

Miaoyuanyuan777 commented 2 months ago

model.compile(loss=loss_fn, optimizer=optimizer, metrics=[tf.keras.metrics.AUC(curve='ROC', multi_label=True), tf.keras.metrics.AUC(curve='PR', multi_label=True)])

I would like to ask if multi_label=True is used in the optimization of the model, which is equivalent to calculating AUC per cell and optimizing it? And then calculated the per peak and per cell AUC of the test dataset directly with the optimized model? And the subsequent analysis is using this per cell AUC optimized model? (Why not set multi_label=False to optimize, that is, calculate the overall AUC of all samples, or optimize with the AUC of per peak?)

hy395 commented 2 months ago

Hi, The model is optimizing on cross-entropy loss. multi-label AUC is the metric we used to track performance and used as the criteria for early-stop. multi-label AUC is calculating AUC per cell, and taking an unweighted average. We give each cell equal weight in this way. Since cells have different depth, and different ratio of pos/neg, I chose to track multilabel AUC.

Miaoyuanyuan777 commented 2 months ago

Hi, Thanks for your response!it has been very helpful to me~ And I have another question: Why, when evaluating a model's generalization performance, do we calculate AUC separately for "per cell" and "per peak" rather than using a single overall AUC? Is this a common practice? I have noticed this approach in other articles as well. Wish you all the best!