Suggest adding more content

CSA Post-Processing README

Overview

This README outlines the steps and required resources for performing post-processing on Cross-Study Analysis (CSA) results using the IMPROVE framework. This pipeline analyzes cross-study data by generating meaningful metrics, visualizations, and summaries by comparing model predictions to ground truth values in test data. This post-processing enables in-depth evaluation of model performance across different studies.

Installation

Please refer to the Main README for detailed installation instructions, including setting up the environment and installing required dependencies.

Example usage

To run the CSA post-processing with specified directories and model details:

python csa_postproc.py --res_dir LGBM/run.csa.small --model_name LGBM --y_col_name auc

Argument Definitions

res_dir (required): Path to the directory containing the results. This should include the predicted and true values for analysis. An example has been provided in the folder LGBM.
model_name (required): Name of the model used for predictions (e.g., GraphDRP, DeepCDR). This name will be used in the output summaries and visualizations.
y_col_name (optional): Name of the column representing the target variable or outcome in the dataset. The default is auc.
outdir (optional): Directory to save post-processing results, including metrics, summaries, and visualizations. If not specified, results will be saved in the current directory.

Output Files

After completion, the following files will be generated in the specified output directory:

all_scores.csv: Contains detailed performance metrics (e.g., mse, rmse, pcc, scc, r2) for each study comparison.
- met: The metric type (e.g., r2.).
- split: Indicates different data splits (e.g., 0, 1, etc.).
- value: The calculated metric value for that split.
- src and trg: Represent the source and target datasets (e.g., CCLE, GDSCv2, gCSI), indicating comparisons within the same dataset or across datasets.
densed_csa_table.csv: Provides a summary of mean and standard deviation for each metric, separated into "within" and "cross" categories.
- met: The metric type.
- mean: The mean value of the metric for either within-dataset or cross-dataset comparisons.
- std: The standard deviation of the metric, indicating variability across studies.
- summary: Either "within" (comparisons within the same dataset) or "cross" (comparisons across different datasets).

3.<metric>_scores.csv: Files containing detailed scores for each metric for different dataset comparisons.

4.<metric>_mean_csa_table.csv: Files containing mean scores for a specific metric across all studies.

5.<metric>_std_csa_table.csv: Files containing the standard deviation of scores for a specific metric across all studies.

JDACS4C-IMPROVE / IMPROVE

Update README.md for CSA post-processing #147

CSA Post-Processing README

Overview

Installation

Example usage

Output Files