Open tsemmler05 opened 1 year ago
Perhaps we can make use of the fact that we are on multi proc machines. In shell I used to find loops with a similar number of elements as there are threads. e.g. a loop over the output variables and put a &
at the end of the mv
command.
The equivalent in python might be to encapsulate the os.sys("mv ${file}")
command inside of a dask delayed section. You can find an example here: https://github.com/JanStreffing/2020_AWICM3_GMD_PAPER/blob/main/python/hovm_difference-cdo.ipynb block 6
As this would be something in the backend, I'd not do the integration myself though. Any takers? :) I guess we could save maybe a factor 10 or here.
The parallel feature will be added during the refactoring of the file dictionaries, I have added to the project so that we don't forget about it.
Just to comment: We had this problem with FOCI-OpenIFS using a 1/12° ocean grid. Our "solution" was to directly write output to the outdata dir rather than the work dir. In an xml file for XIOS this means doing
<file_definition type="one_file" name="../../outdata/nemo/@expname@_@freq@_@startdate@_@enddate@" sync_freq="1d" min_digits="4">
Thanks @joakimkjellsson, goes a little against the safety of the work directory, but I agree, some times we need work-arounds. The good news is that many of these problems will be solved once we release the new filedicts syntax/module and we offer to the user two file structures: the old ultra-safe structure (run_DATE/[work, outdata, restarts, ...]
) or the less safe but faster "run_DATE
is the work folder" (no file duplication inside run_date
)
@tsemmler05, was this specific problem already solved by telling ESM-Tools to move the files instead of copy them?
Even if that's the case, I'll keep the issue opened until we incorporate the parallelization suggested by @JanStreffing into ESM-Tools.
Though it was not not needed here, I think it still nice to have. There will always be some files to copy around because they come from pool and are altered at runtime.
On aleph I ran one year of simulation with large amounts of output:
/scratch/awiiccp5/1950e/
To compare, I also ran one year of simulation with limited output:
/scratch/awiiccp5/1950c_limitedoutput/
In the case of the limited output the result of
stat /scratch/awiiccp5/1950c_limitedoutput/outdata/fesom/vice.fesom.1900.nc
is:
In the case of the large output I get:
or
stat /scratch/awiiccp5/1950e/outdata/fesom/salt.fesom.1900.nc File: '/scratch/awiiccp5/1950e/outdata/fesom/salt.fesom.1900.nc' Size: 193795468497 Blocks: 378506920 IO Block: 4194304 regular file Device: cdb43cdah/3451141338d Inode: 720577194308533609 Links: 1 Access: (0644/-rw-r--r--) Uid: (20907/awiiccp5) Gid: (14907/ iccp2) Access: 2022-08-30 10:09:21.000000000 +0900 Modify: 2022-08-30 08:27:57.000000000 +0900 Change: 2022-08-30 12:04:30.000000000 +0900 Birth: -
more /scratch/awiiccp5/1950e/log/1950e_awicm3.log gives:
Tue Aug 30 08:28:34 2022 : # Beginning of Experiment 1950e Tue Aug 30 08:28:34 2022 : tidy 1 1900-01-01T00:00:00 652375.sdb - start Tue Aug 30 08:28:34 2022 : tidy 1 1900-01-01T00:00:00 652375.sdb - start Tue Aug 30 12:04:50 2022 : prepcompute 2 1901-01-01T00:00:00 652375.sdb - start Tue Aug 30 12:07:31 2022 : prepcompute 2 1901-01-01T00:00:00 652375.sdb - done Tue Aug 30 12:07:31 2022 : tidy 2 1901-01-01T00:00:00 652375.sdb - done Tue Aug 30 12:07:31 2022 : observe_compute 2 1900-01-01T00:00:00 652375.sdb - do ne Tue Aug 30 12:07:37 2022 : compute 1 1900-01-01T00:00:00 82548 - done Tue Aug 30 12:45:14 2022 : compute 2 1901-01-01T00:00:00 652375.sdb - start Tue Aug 30 12:45:57 2022 : observe_compute 2 1901-01-01T00:00:00 652838.sdb - st art
Between 08:27:57 and 12:04:29 the esm_runscripts are only tidying up and accessing some FESOM output data. 350 nodes are blocked for such a long time - for comparison: the computation of 1 year takes 05:20 hours while the tidying up takes 03:36 hours. In the case of the limited output the situation is not as bad (13 minutes for tidying up) but could still be improved. Question is for what purpose the FESOM output data are accessed. It seems like that the FESOM output data are not only moved from one directory to the other but that something is also done to the FESOM data. Is there a possibility to optimize this? It would also help if the esm tools would output time stamps to see due to which esm tools process the time is lost.