Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

Save to excel #262

Closed aulemahal closed 1 year ago

aulemahal commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

New io.to_table and io.save_to_table for saving datasets to dataframes / csv / excel / etc.

This adds support for multi-column and multi-sheet to ds.to_dataframe().

It also supports adding auxiliary coordinates as columns in the output table, beside the data variables. This is actually the most complex part of the code and it might not cover all cases :roll_eyes:...

I also sneaked in a little fix for save_to_netcdf, to allow compute=False. And I took coerce_attrs out of the save function to reduce code duplication.

Does this PR introduce a breaking change?

No.

Other information:

More testing and doc to come.

aulemahal commented 1 year ago

The doc issue comes from xclim and has been fixed in https://github.com/Ouranosinc/xclim/pull/1482/commits/c846ca6c02c286f95fe4d57ce796880d8c649d85.

aulemahal commented 1 year ago

Does @sarahclaude or anyone else already have code to add some kind of table of contents to the excel ?

I can write something simple if not.

sarahclaude commented 1 year ago

I had done this in previous code;

def explication_sheet(path): #creates excel file with path and adds readme sheet
    df1 = pd.DataFrame(data={'column_title': ['experiment_indicator_percentile'],
                             'experiment': ['ssp245 or ssp585'],
                             'percentile': ['10 or 50 or 90'],})

    with pd.ExcelWriter(path, engine='openpyxl') as writer:
        df1.to_excel(writer, sheet_name='readme', index=False)
    return None

df1 could be provided by user in the argument.

And I had this section in the loop that runs through the datasets;

df = pd.DataFrame(data={'indicator_abbreviation': list(ds.data_vars),
                                                'indicator_long_name': [ds[v].attrs['long_name'] for v in
                                                                        ds.data_vars]})
df = df.drop_duplicates(subset=['indicator_long_name'])
df['indicator_abbreviation'] = df.apply(lambda row: '_'.join(row[0].split('_')[1:-1]), axis=1)
with pd.ExcelWriter(path, engine='openpyxl', mode='a', if_sheet_exists='overlay') as writer:
    df.to_excel(writer, sheet_name='readme', index=False, startrow=4)
aulemahal commented 1 year ago

Last commits added a simple toc. It can be localized (column names and long_names when available) with xclim's metadata_locales option.

If the make_toc function is not enough for the use, one can still pass a DataFrame through the add_toc argument. However, I kept it simple by only allowing a single DataFrame, whereas Sarah's example puts two tables in the first page. I believe the same idea for a more complex TOC could still be implemented like this (in your own script):

toc_tables = ... #  Complex TOC generation...
# Write toc tables to the Content sheet
with pd.ExcelWriter(path, mode='a', if_sheet_exists='overlay') as writer:
    startrow = 0
    for toc_table in toc_tables:
        toc_table.to_excel(writer, sheet_name='Content', startrow=startrow)
        startrow += toc_table.length + 2

xs.io.save_to_table(ds, path, mode='a')  #  mode 'a' to append to file passed to pandas.