biomap-research / scFoundation

Apache License 2.0
183 stars 27 forks source link

Clarification Requested on Parameter Settings for Embedding vs. Non-Embedding Models in SCAD Demo Usage #23

Closed LCGaoZzz closed 3 months ago

LCGaoZzz commented 3 months ago

Hello, its me again! 😀

First, thank you for your efforts in developing and sharing the model.

I was reviewing the demonstration usage provided in the documentation of SCAD, specifically the commands for training the model with and without embedding. Here are the commands for reference:

Without Embedding

CUDA_VISIBLE_DEVICES=1 python model/SCAD_train_binarized_5folds-pub.py -e FX -d Sorafenib -g _norm -s 42 -h_dim 512 -z_dim 128 -ep 20 -la1 5 -mbS 8 -mbT 8 -emb 0

With Embedding

CUDA_VISIBLE_DEVICES=1 python model/SCAD_train_binarized_5folds-pub.py -e FX -d Sorafenib -g _norm -s 42 -h_dim 1024 -z_dim 256 -ep 80 -la1 0.2 -mbS 32 -mbT 32 -emb 1

I have a concern regarding the significant difference in parameter settings between these two configurations, especially if the goal is to evaluate the impact of embedding on model performance. The variations in h_dim, z_dim, epoch, lambda1, and batch sizes might confound the results, making it challenging to attribute performance improvements solely to the use of embedding.

Questions:

  1. Was there a specific reason for the choice of such divergent parameter settings between the with and without embedding configurations?
  2. Would it be possible to share results or insights on how each of these parameter changes (independently of embedding) affects the model's performance?
  3. Have any controlled experiments been conducted where the only variable changed is the use of embedding, keeping all other parameters constant? If so, could you please share those findings?

Understanding the impact of embedding in isolation could provide clearer insights into its effectiveness and potential benefits in drug response prediction models.

Thank you for your time and consideration.

WhirlFirst commented 3 months ago

Hi, Thank you for your interest in our work.

As for the SCAD task, it should be noticed that the input dimension of gene expression and embedding is not the same, as well as the data distribution or other features. Thus, It would be not ideal to use the same hyperparameter setting for both embedding and non-embedding models.

To tackle this problem, we followed the same strategy as described in the SCAD paper: All the hyperparameters were selected based on the prediction performance on the validation set of the source domain. For more information, please refer to their paper and code. Since in the source domain, we also used the embeddings for training, the final hyperparameter would not be the same as the non-embedding one.

As a result, the performance of using the same parameter setting may not be directly comparable. Thus we didn’t analyze in detail how single parameter variation changed the final performance.

Thank you again for your interest in this task. I believe single-cell level drug sensitivity prediction is a future and promising direction.

LCGaoZzz commented 3 months ago

Hi, Thank you for your interest in our work.

As for the SCAD task, it should be noticed that the input dimension of gene expression and embedding is not the same, as well as the data distribution or other features. Thus, It would be not ideal to use the same hyperparameter setting for both embedding and non-embedding models.

To tackle this problem, we followed the same strategy as described in the SCAD paper: All the hyperparameters were selected based on the prediction performance on the validation set of the source domain. For more information, please refer to their paper and code. Since in the source domain, we also used the embeddings for training, the final hyperparameter would not be the same as the non-embedding one.

As a result, the performance of using the same parameter setting may not be directly comparable. Thus we didn’t analyze in detail how single parameter variation changed the final performance.

Thank you again for your interest in this task. I believe single-cell level drug sensitivity prediction is a future and promising direction.

Thanks for the clarification! Your expertise and effort are highly appreciated! 👍