SEE-GEO / ccic

Chalmers Cloud Ice Climatology
MIT License
1 stars 2 forks source link

Process data #8

Closed simonpf closed 10 months ago

simonpf commented 2 years ago

Produce IWP retrievals. Time ranges where validation data is available should have priority.

Variables: tiwp, cloud_prob_2d, inpainted

adriaat commented 1 year ago

It's not a big issue but good to leave it documented: If the output format Zarr is chosen and a file is re-processed and requested to be saved in the same path, the following is observed:

  1. Process one file:
    $ ccic process /mnt/data_copper2/ccic/models/ccic_v2.pckl cpcir /home/amell/tmp 2014-01-16T02:30:00 --roi 128.68 -12.45 132.46 -9.83 --targets tiwp tiwp_fpavg tiwc cloud_type cloud_prob_2d cloud_prob_3d --n_processes 1 --output_format zarr

    Standard output:

    /home/amell/pansat_ccic/pansat/pansat/time.py:19: UserWarning: Discarding nonzero nanoseconds in conversion. 
      return pd.Timestamp(time).to_pydatetime()
  2. Re-execute the same command:
    $ ccic process /mnt/data_copper2/ccic/models/ccic_v2.pckl cpcir /home/amell/tmp 2014-01-16T02:30:00 --roi 128.68 -12.45 132.46 -9.83 --targets tiwp tiwp_fpavg tiwc cloud_type cloud_prob_2d cloud_prob_3d --n_processes 1 --output_format zarr

    Standard output:

    /home/amell/pansat_ccic/pansat/pansat/time.py:19: UserWarning: Discarding nonzero nanoseconds in conversion.
      return pd.Timestamp(time).to_pydatetime()
    /home/amell/ccic/ccic/bin/process.py (ERROR     ) :: path '' contains a group
    Traceback (most recent call last):
      File "/home/amell/ccic/ccic/bin/process.py", line 269, in write_output
        data.to_zarr(output_path / output_file, encoding=encodings)
      File "/home/amell/miniconda3/envs/ccic/lib/python3.8/site-packages/xarray/core/dataset.py", line 2068, in to_zarr
        return to_zarr(  # type: ignore
      File "/home/amell/miniconda3/envs/ccic/lib/python3.8/site-packages/xarray/backends/api.py", line 1613, in to_zarr
        zstore = backends.ZarrStore.open_group(
      File "/home/amell/miniconda3/envs/ccic/lib/python3.8/site-packages/xarray/backends/zarr.py", line 409, in open_group
        zarr_group = zarr.open_group(store, **open_kwargs)
      File "/home/amell/miniconda3/envs/ccic/lib/python3.8/site-packages/zarr/hierarchy.py", line 1458, in open_group
        raise ContainsGroupError(path)
    zarr.errors.ContainsGroupError: path '' contains a group

    This does not happen when choosing --output_format netcdf.

adriaat commented 1 year ago
This comment is outdated.

I started the processing for global retrievals using the targets `tiwp` and `cloud_prob_2d` as well as the argument `--inpainted_mask`. The use of the later only represent an increment of about 1% in the file size for one full year. 2013, 2014, ~~2010~~, 2015 will be processed in this order.