ImperialCollegeLondon / SWMManywhere

SWMManywhere is used to derive and simulate a sewer network anywhere in the world
https://imperialcollegelondon.github.io/SWMManywhere/
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Update to new version of pywbt #281

Closed barneydobson closed 1 week ago

barneydobson commented 3 weeks ago

Description

Update to new usage. @cheginit I know this probably isn't quite your intended use but at the moment I just want to always download a fresh WBT in a temp directory on the HPC because of HPC mystery errors.

That also raised an important point that pywbt seems to require the WBT directory to be different from the src_dir (or maybe save_dir not sure) - maybe worth validating in the main call.

cheginit commented 2 weeks ago

Have you tried using zip_path option to see if you have this issue on the cluster?

cheginit commented 2 weeks ago

That also raised an important point that pywbt seems to require the WBT directory to be different from the src_dir (or maybe save_dir not sure) - maybe worth validating in the main call.

src_dir is the directory that contains all the input files, e.g., dem.tif, not the WBT binaries (that one is done with wbt_root). save_dir is the directory that the outputs from WBT will be stored in, e.g., fdir.tif. So, what you want is probably something like this:

with tempfile.TemporaryDirectory(dir=str(fid.parent)) as temp_dir:
    pywbt.whitebox_tools(fid.parent, wbt_args, ("fdir.tif",), save_dir=fid.parent, wbt_root=temp_dir, zip_path="/path/to/scratch/wbt_v240_linux.zip", max_procs=1)

You can download the zip file from here and save it to your scratch folder on the cluster as wbt_v240_linux.zip.

EDIT: On the cluster, run:

wget -O /path/to/your/scratch/folder/wbt_v240_linux.zip https://raw.githubusercontent.com/cheginit/pywbt/main/tests/wbt_zip/WhiteboxTools_linux_amd64.zip 

You can also check pywbt's test.

barneydobson commented 1 week ago

@cheginit - RE save_dir - well it could be temp_path since it is only used to return the flow_dir array for subcatchment delineation - but it's a good point that someone might want to inspect it so I've added an option for it in verbose.

RE the zip_path - I don't want to add a thing for the user to do, which I guess this would do? - so I don't quite see what's wrong with relying on the download?

RE WBT - not sure I understand what's up but certainly if I change wbt_root to not be in its own temporary directory it causes the test to fail

cheginit commented 1 week ago

My suggestion was for your use case of running on the cluster, for the public version, I offered a suggestion at #282 that works without any issue and adds no user-specified option. It saves the WBT binary in the input directory and use that for subsequent runs.

The main issue with letting the code to download the file everytime is abusing the WBT servers. Assume that a user decide to run this for some analysis, 10000 times, then WBT servers get hit 10000 times over a short period! Although, they provide the download link free of charge, let's not bankrupt them 😄

barneydobson commented 1 week ago

@cheginit OK find revised according to suggestion in #282 - still seems to fail in CI when I put wbt_root = temp_path / 'WBT' (but not on my machine... weird) with the same error mentioned in the initial post of this PR. This is passing instead though - ready to go when you can approve

cheginit commented 1 week ago

I think @dalonsoa has already implemented this in that PR and tests passed, is this still needed?

barneydobson commented 1 week ago

@cheginit oh yes - my bad - sorry that PR is complicated ;)

not sure why it works there and not here but whichever 👍