JDACS4C-IMPROVE / IMPROVE

Libraries and scripts for basic IMPROVE functionalities
MIT License
1 stars 3 forks source link

Update README.md for CSA post-processing #147

Closed adpartin closed 3 weeks ago

jonesse3 commented 3 weeks ago

Suggest adding more content

CSA Post-Processing README

Overview

This README outlines the steps and required resources for performing post-processing on Cross-Study Analysis (CSA) results using the IMPROVE framework. This pipeline analyzes cross-study data by generating meaningful metrics, visualizations, and summaries by comparing model predictions to ground truth values in test data. This post-processing enables in-depth evaluation of model performance across different studies.

Installation

Please refer to the Main README for detailed installation instructions, including setting up the environment and installing required dependencies.

Example usage

To run the CSA post-processing with specified directories and model details:

python csa_postproc.py --res_dir LGBM/run.csa.small --model_name LGBM --y_col_name auc

Argument Definitions

Output Files

After completion, the following files will be generated in the specified output directory:

  1. all_scores.csv: Contains detailed performance metrics (e.g., mse, rmse, pcc, scc, r2) for each study comparison.

    • met: The metric type (e.g., r2.).
    • split: Indicates different data splits (e.g., 0, 1, etc.).
    • value: The calculated metric value for that split.
    • src and trg: Represent the source and target datasets (e.g., CCLE, GDSCv2, gCSI), indicating comparisons within the same dataset or across datasets.
  2. densed_csa_table.csv: Provides a summary of mean and standard deviation for each metric, separated into "within" and "cross" categories.

    • met: The metric type.
    • mean: The mean value of the metric for either within-dataset or cross-dataset comparisons.
    • std: The standard deviation of the metric, indicating variability across studies.
    • summary: Either "within" (comparisons within the same dataset) or "cross" (comparisons across different datasets).

3.<metric>_scores.csv: Files containing detailed scores for each metric for different dataset comparisons.

4.<metric>_mean_csa_table.csv: Files containing mean scores for a specific metric across all studies.

5.<metric>_std_csa_table.csv: Files containing the standard deviation of scores for a specific metric across all studies.

adpartin commented 3 weeks ago

@jonesse3 @wilke thanks for the feedback. Please check the updated README.