geoschem / gcpy

Python toolkit for GEOS-Chem. Contains basic plotting scripts, plus the suite of GEOS-Chem benchmarking utilities.
https://gcpy.readthedocs.io
Other
51 stars 24 forks source link

Plot parallelization off failing in GCPy 1.4.1 #285

Closed lizziel closed 10 months ago

lizziel commented 10 months ago

Name and Institution (Required)

Name: Lizzie Lundgren Institution: Harvard University

Description of your issue or question

I am getting the following error when running the transport tracer benchmark with GCPy 1.4.1 with plotting parallelization turned off in the 1yr transport tracer benchmark configuration file. I am using python 3.9.18. Full package list is in https://github.com/geoschem/gcpy/issues/284.

Traceback (most recent call last):
  File "/gpfsm/dnb34/ewlundgr/python/mambaforge/envs/gcpy_v1_4_1/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/gpfsm/dnb34/ewlundgr/python/mambaforge/envs/gcpy_v1_4_1/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark/run_benchmark.py", line 1606, in <module>
    main(sys.argv)
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark/run_benchmark.py", line 1602, in main
    choose_benchmark_type(config)
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark/run_benchmark.py", line 100, in choose_benchmark_type
    run_1yr_tt_benchmark(
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark/modules/run_1yr_tt_benchmark.py", line 662, in run_benchmark
    bmk.make_benchmark_conc_plots(
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark_funcs.py", line 1510, in make_benchmark_conc_plots
    dict_sfc = {list(result.keys())[0]: result[list(
  File "/home/ewlundgr/nb/python/gcpy/gcpy/benchmark_funcs.py", line 1510, in <dictcomp>
    dict_sfc = {list(result.keys())[0]: result[list(
AttributeError: 'str' object has no attribute 'keys'
yantosca commented 10 months ago

Thanks @lizziel. I think I see what the problem is. When you turn off parallelization you might be getting a string back instead of a dict. Let me see if I can reproduce this locally.

yantosca commented 10 months ago

Hi @lizziel! I think I've figured this out. This is happening in the various places where plots are parallelized. There are code blocks such as:

    # --------------------------------------------
    # Create the plots in parallel
    # Turn off parallelization if n_job=1
    if n_job != 1:
        results = Parallel(n_jobs=n_job)(
            delayed(createplots)(filecat)
            for _, filecat in enumerate(catdict)
        )
    else:
        for _, filecat in enumerate(catdict):
            results = createplots(filecat)
    # --------------------------------------------

in e.g. gcpy/benchmark_funcs.py.

So when parallelization (n_cores: -1) is on, the results variable comes back as:

[{'Aerosols': {'sfc': [], '500': [], 'zm': []}}, {'Bromine': {'sfc': [], '500': [], 'zm': []}}, {'Chlorine': {'sfc': [], '500': [], 'zm': []}}, {'Iodine': {'sfc': [], '500': [], 'zm': []}}, {'Nitrogen': {'sfc': [], '500': [], 'zm': []}}, {'Oxidants': {'sfc': [], '500': [], 'zm': []}}, {'Primary_Organics': {'sfc': [], '500': [], 'zm': []}}, {'ROy': {'sfc': [], '500': [], 'zm': []}}, {'Secondary_Organic_Aerosols': {'sfc': [], '500': [], 'zm': []}}, {'Secondary_Organics': {'sfc': [], '500': [], 'zm': []}}, {'Sulfur': {'sfc': [], '500': [], 'zm': []}}]

but when parallelization is off (n_cores: 1), the results variable comes back as:

{'Sulfur': {'sfc': [], '500': [], 'zm': []}}

I think the solution is to make results a list and then append the output of the createplots function to the list when parallelization is off. I'll implement a fix.

yantosca commented 10 months ago

Closed by #287

yantosca commented 10 months ago

We can close this issue now because #287 has been merged. This problem is now fixed.