Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

attrs_prefix not working in jupyter notebook with dask #316

Closed juliettelavoie closed 8 months ago

juliettelavoie commented 8 months ago

Setup Information

Description

The attrs_prefix cat: doesn't work when in a jupyter notebook and under a dask client. Instead, we have intake_esm_attrs:

Steps To Reproduce

in a jupyter notebook

with (Client(n_workers=2, threads_per_worker=5, memory_limit="30GB", local_directory= '/exec/jlavoie/tmp_eg6/',
            dashboard_address=6786, silence_logs=True)):
    cat_sim = xs.DataCatalog('simulation.json')
    ds=cat_sim.search(id='CanDCS-U6_CMIP6_ScenarioMIP_MIROC_MIROC6_ssp585_r1i1p1f1_CAN').to_dask()
ds.attrs

gives

...
'intake_esm_attrs:institution': 'MIROC',
...

but

    ds=cat_sim.search(id='CanDCS-U6_CMIP6_ScenarioMIP_MIROC_MIROC6_ssp585_r1i1p1f1_CAN').to_dask()
ds.attrs

gives

...
'cat:institution': 'MIROC',
...

Additional context

Contribution

juliettelavoie commented 8 months ago

oups just saw #176

aulemahal commented 8 months ago

In a python file, Dask's multiprocessing initializes the workers with the state of the python process before if __name__ == '__main__'. In a notebook, all the code is executed after the equivalent. Thus, imports in any cell are not included in the initial state of the worker.

When to_dask is executed, the code sent to the workers includes references to intake-esm which is then imported automatically. However, it does not include any reference to xscen, which is then not imported, and the attrs_prefix option is never updated.