Imageomics / distributed-downloader

MPI-based distributed downloading tool for retrieving data from diverse domains.
MIT License
2 stars 0 forks source link

Generalize submitter, .env, and bash and slurm scripts #3

Closed egrace479 closed 4 months ago

egrace479 commented 4 months ago

The sample .env file now defines paths previously defined in the submitted script. It also has the project name, so that and all the distribution of work settings can be passed to the slurm scripts directly through the bash coordination script (example_scripts/submit_mpi_download.sh) instead of needing to be redefined at the top of each file. This does add a slightly higher resource request for the verifier, but (following our group discussion) this shouldn't add much to resource allocation since it is only run for a couple seconds on each batch.

src/submitter.py is now the primary user interface with the package (following pre-processing steps). Its only input is the path to the .env file, so that it can fetch the required paths. There was also some restructuring to avoid redefining variables.

egrace479 commented 4 months ago

There are still some questions to be answered on overall package and things to do after this update is settled:

  1. Presumably we will want to change mpi_downloader to distributed_downloader in src. We then have to decide how to work with the other scripts outside of it but still in src. The idea is that src/submitter.py will be the interface that runs the whole download, while the scripts/ and config/ files will be modified by users to pass into the process.
  2. The resize functions should be moved out to a tools folder or similar at the root level of the repository. I'm fine with leaving them not generalized for now, but good to include the tools.
  3. The functioning of the preprocessing steps should be clarified (i.e., how to run server_prep and MPI_download_prep--what goes in and what actually comes out).
  4. Similarly to 3, we need to clarify requirements for the actual download process (if nothing else, this could be the structure established by the scripts noted in 3).
  5. We need tests. In particular, this restructuring should be tested to ensure it still functions as expected.