This README outlines the steps and required resources for performing post-processing on Cross-Study Analysis (CSA) results using the IMPROVE framework. This pipeline analyzes cross-study data by generating meaningful metrics, visualizations, and summaries by comparing model predictions to ground truth values in test data. This post-processing enables in-depth evaluation of model performance across different studies.
Installation
Please refer to the Main README for detailed installation instructions, including setting up the environment and installing required dependencies.
Example usage
To run the CSA post-processing with specified directories and model details:
res_dir (required): Path to the directory containing the results. This should include the predicted and true values for analysis. An example has been provided in the folder LGBM.
model_name (required): Name of the model used for predictions (e.g., GraphDRP, DeepCDR). This name will be used in the output summaries and visualizations.
y_col_name (optional): Name of the column representing the target variable or outcome in the dataset. The default is auc.
outdir (optional): Directory to save post-processing results, including metrics, summaries, and visualizations. If not specified, results will be saved in the current directory.
Output Files
After completion, the following files will be generated in the specified output directory:
all_scores.csv: Contains detailed performance metrics (e.g., mse, rmse, pcc, scc, r2) for each study comparison.
met: The metric type (e.g., r2.).
split: Indicates different data splits (e.g., 0, 1, etc.).
value: The calculated metric value for that split.
src and trg: Represent the source and target datasets (e.g., CCLE, GDSCv2, gCSI), indicating comparisons within the same dataset or across datasets.
densed_csa_table.csv: Provides a summary of mean and standard deviation for each metric, separated into "within" and "cross" categories.
met: The metric type.
mean: The mean value of the metric for either within-dataset or cross-dataset comparisons.
std: The standard deviation of the metric, indicating variability across studies.
summary: Either "within" (comparisons within the same dataset) or "cross" (comparisons across different datasets).
3.<metric>_scores.csv: Files containing detailed scores for each metric for different dataset comparisons.
4.<metric>_mean_csa_table.csv: Files containing mean scores for a specific metric across all studies.
5.<metric>_std_csa_table.csv: Files containing the standard deviation of scores for a specific metric across all studies.
Suggest adding more content
CSA Post-Processing README
Overview
This README outlines the steps and required resources for performing post-processing on Cross-Study Analysis (CSA) results using the IMPROVE framework. This pipeline analyzes cross-study data by generating meaningful metrics, visualizations, and summaries by comparing model predictions to ground truth values in test data. This post-processing enables in-depth evaluation of model performance across different studies.
Installation
Please refer to the Main README for detailed installation instructions, including setting up the environment and installing required dependencies.
Example usage
To run the CSA post-processing with specified directories and model details:
Argument Definitions
res_dir (required)
: Path to the directory containing the results. This should include the predicted and true values for analysis. An example has been provided in the folderLGBM
.model_name (required)
: Name of the model used for predictions (e.g., GraphDRP, DeepCDR). This name will be used in the output summaries and visualizations.y_col_name (optional)
: Name of the column representing the target variable or outcome in the dataset. The default is auc.outdir (optional)
: Directory to save post-processing results, including metrics, summaries, and visualizations. If not specified, results will be saved in the current directory.Output Files
After completion, the following files will be generated in the specified output directory:
all_scores.csv
: Contains detailed performance metrics (e.g., mse, rmse, pcc, scc, r2) for each study comparison.met
: The metric type (e.g., r2.).split
: Indicates different data splits (e.g., 0, 1, etc.).value
: The calculated metric value for that split.src
andtrg
: Represent the source and target datasets (e.g., CCLE, GDSCv2, gCSI), indicating comparisons within the same dataset or across datasets.densed_csa_table.csv
: Provides a summary of mean and standard deviation for each metric, separated into "within" and "cross" categories.met
: The metric type.mean
: The mean value of the metric for either within-dataset or cross-dataset comparisons.std
: The standard deviation of the metric, indicating variability across studies.summary
: Either "within" (comparisons within the same dataset) or "cross" (comparisons across different datasets).3.
<metric>_scores.csv
: Files containing detailed scores for each metric for different dataset comparisons.4.
<metric>_mean_csa_table.csv
: Files containing mean scores for a specific metric across all studies.5.
<metric>_std_csa_table.csv
: Files containing the standard deviation of scores for a specific metric across all studies.