Closed chengzhuzhang closed 1 month ago
Updates: All sub sets code refactoring including (annual_cycle, diurnal_cycle(_zt), convection_onset, aerosol_activation) are completed.
Todo:
Fix mypy
issues
Potential performance bottleneck from deriving variables using xarray dataset as input
Potential performance bottleneck from deriving variables using xarray dataset as input
Thank you for identifying the two formulas code as potential bottlenecks. I confirmed that these computations with Xarray are indeed much slower than CDAT.
I ran a performance benchmark and found the solution to the slowness: we need to call .load(scheduler="sync")
in _get_dataset_with_source_vars()
to speed up the computation. I pushed commit ef44fc6
(#842) with this fix.
The first runtime is the current code and the second runtime is with .load()
. I also ran e3sm_diags with commit ef44fc6
(#842) and confirmed a significant runtime improvement, similar to the benchmarks below. Performance is now on-par with CDAT.
"""
Results
----------
1. Elapsed time (Xarray non-chunked): 6.540755605790764 seconds
2. Elapsed time (Xarray non-chunked with .load()): 0.17097265785560012 seconds
3. Elapsed time (Xarray chunked): 0.1452920027077198 seconds
4. Elapsed time (numpy .values): 6.418793010059744 seconds
5. Elapsed time (numpy .data): 7.334999438840896 seconds
"""
Elapsed time (CDAT main branch, single runtime): 0.12261438369750977 seconds
The first runtime is the current code and the second runtime is with
.load()
. I also ran e3sm_diags with commitef44fc6
(#842)
@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without, .load()
, the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the .load()
change?
The first runtime is the current code and the second runtime is with
.load()
. I also ran e3sm_diags with commitef44fc6
(#842)@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without,
.load()
, the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the.load()
change?
I would think that loading the derived variables dataset into memory shouldn't slow down performance unless the datasets were extremely large (which we should use Dask chunking for).
I only benchmarked performance for the formula computations. I will benchmark a complete run to verify your findings and determine if we should revert the commit or not.
Side-note:
_It could be that the logic I implemented already stores the dataset in-memory, since it merges multiple xr.Dataset objects opened via open_dataset()
(uses numpy arrays) instead of using open_mfdataset()
(uses Dask arrays). If this the case, I don't see how .load()
would improve the speed of the formula computations though._
RE: https://github.com/E3SM-Project/e3sm_diags/pull/842#issuecomment-2344166562
I found adding .load()
in _get_dataset_with_source_vars()
adds 2-4 minutes of runtime to a complete arm_diags
run.
I reverted this change and will now address the pre-commit
issues.
# Commit: 58361c49-4b1b-11ec-9b3b-9c5c8e2f5e4e (no .load())
# run_set function took 281.78 seconds to complete.
# run_set function took 332.79 seconds to complete.
# Commit: ef44fc6ffd538a5e257b097b99f7a1a79b79bc3b (with .load())
# run_set function took 472.81 seconds to complete.
Hey @chengzhuzhang, I fixed the pre-commit issues in 2c248fb
(#842). I also performed initial code cleanup since I would probably be doing that later anyways. Refer to the commit message and my review comments for more information.
I re-ran all sets and they completed successfully with these changes, although I noticed some of the diagnostic sets weren't done yet.
Thank you @tomvothecoder I will do another pass to see if there are other sets need to be refactored. Thanks a lot for fixing and clean this branch.
@tomvothecoder I applied png
regression tests. And all figures are produced and results are as expected as noted in the notebook.
Also performance-wise, the refactored codes are similar to the original codes. wall time is ~ 10mins with 4 workers for the full arm_diags set.
Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.
Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.
@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!
Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.
@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!
I found that no sets actually run "annual_cycle_aerosol"
(arm_diags_model_vs_obs.cfg, arm_diags_model_vs_model.cfg). Unless we expect to run this diagnostic in the future, I think we can delete _run_diag_annual_cycle_aerosol
instead of refactoring it.
@tomvothecoder good for catching this. I think for now we can just create an issue to log this problem, and we can add back the code at a later time.
arm_diags_driver.py
.png
regression test shows all plots are identical except the following, which have missing data (white spaces) compared to main
:
CLOUD
test variableComparing:
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
* Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
Comparing:
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
* Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
Comparing:
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
* Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
Comparing:
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
* Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
Comparing:
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
* /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
* Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
This PR should be good to go. The integration test image checker fails because there is now one additional plot for the sgpc1
region that was generated with a fix in this PR (related line)
I will merge this PR even though the integration tests fail. I will update the complete test results on LCRC using the cdat-migration-fy24
branch. The complete test results should also include the QBO wavelet plot from #850.
Description
Additional Changes
arm_diags_driver.py
andarm_diags_viewer.py
derivations.py
andformulas.py
arm_diags_plot.py
sgpc1
plot not being generated correctly (Closes #858)arm_diags_parameter.py
time_interval
attribute to store hours for diurnal cycledataset_xr.py
"co"
flag from cdms2, which excludes the last time coordinate when subsetting if that time coordinate spans the next year (Dataset._exclude_sub_monthly_coord_spanning_year()
)diurnal_cycle_xr.py
diurnal_cycle_xr.py
to convert coordinate DataArrays to lists and convert for loop to list comprehension for performance gainsChecklist
If applicable: