JDACS4C-IMPROVE / Benchmarks

ECP-CANDLE Benchmarks
MIT License
0 stars 0 forks source link

DRAFT: Preprocess the IMPROVE way for our Uno model #1

Open rajeeja opened 1 year ago

rajeeja commented 1 year ago

Map files from the Uno way of doing things to IMPROVE way: UNO: https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/benchmarks/Pilot1/combo/

IMPROVE: https://ftp.mcs.anl.gov/pub/candle/public/improve/IMP_data/


 Uno git:(develop) ✗ grep DATA_URL uno_data.py
DATA_URL = "https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/benchmarks/Pilot1/combo/"
    path = get_file(DATA_URL + "rescaled_combined_single_drug_growth")
    path = get_file(DATA_URL + "ComboDrugGrowth_Nov2017.csv")
    cellmap_path = get_file(DATA_URL + "NCI60_CELLNAME_to_Combo.txt")
    path = get_file(DATA_URL + "combined_single_response_agg")
    path = get_file(DATA_URL + "extended_combined_mordred_descriptors")
    path = get_file(DATA_URL + "drug_info")
    path = get_file(DATA_URL + "cl_metadata")
    path = get_file(DATA_URL + "NCI60_CELLNAME_to_Combo.txt")
    path = get_file(DATA_URL + "cl_mapping")
    path = get_file(DATA_URL + "NCI_IOA_AOA_drugs")
    path = get_file(DATA_URL + "{}_dragon7_descriptors.tsv".format(drug_set))
    path = get_file(DATA_URL + "{}_dragon7_{}.tsv".format(drug_set, fp))
    path = get_file(DATA_URL + 'ChemStr

Replace uno_data CombinedDataGenerator, CombinedDataLoader, DataFeeder with improve equivalents.

Work on Uno_IMPROVE folder for this work. The goal is to be able to do cross-study like the other IMPROVE models.

rajeeja commented 11 months ago

Another approach that can be added - maybe a little difficult -- to use uno_preprocess.py to create a .h5 file and use the config parameter --use-exported and load that hdf5 file for train and infer. That'd be much less intrusive for the original model.