COSIMA / make_diag_table

Python script to generate MOM diag_table
1 stars 2 forks source link

Support multiple separators #7

Closed aekiss closed 3 months ago

aekiss commented 3 months ago

Sometimes we want an alternative separator (e.g. no separator) between particular components - e.g. see https://github.com/COSIMA/access-om3/issues/190#issuecomment-2252030909 - how to implement this?

aekiss commented 3 months ago

It would be nice to modify make_diag_table.py so that this behaviour can be set as an option via diag_table_source.yaml.

Also, perhaps we can generalise by allowing file_name_separator to be an array of alternative separators (in this case ['.', '']) and specify when to apply which separator?

Even neater, how about supporting this yaml format, where file_name can contain arrays whose names specify the separator to be applied before each element?

    file_name:  # file name (without trailing ".nc") is constructed from these components,
    # separated by file_name_separator, as per https://github.com/COSIMA/access-om2/issues/185
    # All these components (other than field_name) must be names that exist in global_defaults.
    # If file_name_date or file_name_date_section are used, it must be the last item.
        - "_":
            - file_name_prefix
            - file_name_dimension
            - field_name  # substituted by field name in diag_table section below
            - output_freq
        - "":
            - output_freq_units
        - "_":
            - reduction_method
            - file_name_date
aekiss commented 3 months ago

This actually makes file_name_separator redundant, but we should retain it for backward compatability. Perhaps the rule should be: string items in file_name are separated by file_name_separator, but keys whose values are arrays have the array elements separated by the array key as above. With this approach we could write

     file_name_separator: "-"  # used to separate filename components; best not to use "_" to avoid confusion with fields and dates
     file_name:  # file name (without trailing ".nc") is constructed from these components,
    # separated by file_name_separator, as per https://github.com/COSIMA/access-om2/issues/185
    # All these components (other than field_name) must be names that exist in global_defaults.
    # If file_name_date or file_name_date_section are used, it must be the last item.
        - file_name_prefix
        - file_name_dimension
        - field_name  # substituted by field name in diag_table section below
        - output_freq
        - "":
           - output_freq_units
        - reduction_method
        - file_name_date

meaning all elements except output_freq_units are separated by file_name_separator.

aekiss commented 3 months ago

Thanks for your PR @minghang-li, sorry I didn't explain but I'm working on an implementation of the proposal here, which I think is easier for a user to understand.

minghangli-uni commented 3 months ago

Hi @aekiss, I think this PR also supports a single separator to maintain the backward compatability and a list of separators to ensure flexibility. It applies not just to output_freq_units and output_freq, but also addresses the aesthetic issue proposed in https://github.com/COSIMA/access-om2/issues/185#issuecomment-632428116

minghangli-uni commented 3 months ago

To clarify, what I meant to address the aesthetic issue, it is something like this, since we intend to remove ym before the date.

ocean-3d-temp-1-monthly-mean-ymd_1958_10_16.nc --> ocean-3d-temp-1monthly-mean_1958_10_16.nc

aekiss commented 3 months ago

Thanks, but I think it's better interface design to specify optional custom separators via dict elements within the file_name list as shown above, rather than as a separate file_name_separator array of separators whose length needs to agree with the length of file_name (which can be tricky if file_name_omit_empty is true). I'll put together a PR to show you what I mean.

aekiss commented 3 months ago

have a look at this PR https://github.com/COSIMA/make_diag_table/issues/7