JDACS4C-IMPROVE / IMPROVE

Libraries and scripts for basic IMPROVE functionalities
MIT License
1 stars 3 forks source link

Exclude "index" files from downloading along with benchmark dataset #145

Closed jonesse3 closed 2 weeks ago

jonesse3 commented 1 month ago
Screenshot 2024-10-24 at 6 27 09 PM
jonesse3 commented 3 weeks ago

wget --cut-dirs=8 -P ./ -nH -np -m --reject "index.html*,index.*" https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/

jonesse3 commented 3 weeks ago

@adpartin @priyanka9991 The above fixes the problem. How should we go about fixing this problem as each repo has a download_csa.sh file?

priyanka9991 commented 3 weeks ago

@jonesse3 I just realized that setup_improve.sh uses download_csa.sh. We need to update download_csa.sh with the new wget and add both setup_improve.sh and download_csa.sh in here https://github.com/JDACS4C-IMPROVE/IMPROVE/tree/develop/templates @nkoussa

adpartin commented 3 weeks ago

The new version of setup_improve.sh that we have here https://github.com/JDACS4C-IMPROVE/IMPROVE/blob/develop/setup_improve.sh doesn't download the data. It might make more sense to separate setup_improve.sh and download_csa.sh, so that one downloads the data and the other sets PYTHONPATH.

nkoussa commented 3 weeks ago

Once we have the pip install we won't really need setup_improve at all, right?

priyanka9991 commented 3 weeks ago

The new version of setup_improve.sh that we have here https://github.com/JDACS4C-IMPROVE/IMPROVE/blob/develop/setup_improve.sh doesn't download the data. It might make more sense to separate setup_improve.sh and download_csa.sh, so that one downloads the data and the other sets PYTHONPATH.

Yes I agree. Downloading csa data should be associated with CSA workflow, not model. We can provide download_csa.sh within the IMPROVE/workflows/parsl_csa or bruteforce_csa. It is better to remove CSA benchmark data download from setup_improve.sh.

adpartin commented 3 weeks ago

The new version of setup_improve.sh that we have here https://github.com/JDACS4C-IMPROVE/IMPROVE/blob/develop/setup_improve.sh doesn't download the data. It might make more sense to separate setup_improve.sh and download_csa.sh, so that one downloads the data and the other sets PYTHONPATH.

Yes I agree. Downloading csa data should be associated with CSA workflow, not model. We can provide download_csa.sh within the IMPROVE/workflows/parsl_csa or bruteforce_csa. It is better to remove CSA benchmark data download from setup_improve.sh.

totally agree.

priyanka9991 commented 3 weeks ago

@adpartin Is the new setup_improve.sh executed at the IMPROVE level? Not within the model repo?

adpartin commented 3 weeks ago

@priyanka9991 we initially executed it from a model repo as part of setting up the environment for running a model, e.g. https://github.com/JDACS4C-IMPROVE/GraphDRP/blob/develop/setup_improve.sh.

Later, we also placed it in the main dir of the IMPROVE repo which I used to set up the environment for testing specific functionalities of IMPROVE https://github.com/JDACS4C-IMPROVE/IMPROVE/blob/develop/setup_improve.sh

Actually, I'm not sure if download_csa.sh should be within the IMPROVE/workflows/parsl_csa or bruteforce_csa dirs. We have some form of this download in scripts/get-benchmakrs, so we can refer to the relevant download scripts from the workflows dirs.

adpartin commented 2 weeks ago

Solved in #153