actris-cloudnet / cloudnetpy

Python package for Cloudnet data processing
MIT License
39 stars 28 forks source link

mira2nc for ZNC files and STSR mira #83

Closed spirrobe closed 1 year ago

spirrobe commented 1 year ago

Hi

We are running cloudnetpy for our site Eriswil seperate from the cloudnethub (see here). In our last campaigns setup we did run a "new" STSR mira, and we tend to generally use the znc files over the mmclx files. However, mira2nc relies on the mmclx file type, which does not contain a proper LDR in this case as the new file type from Metek is znc. Since cloudnetpy does not rely on the additional classification in the mmclx, there is the possibility to use znc instead which as the same time can solve the issue with the LDR. I adjusted the code (see below) to be able to read ZNC, of the new LDR(h2l) and the old LDR(g) of znc files. When available the LDRh2l takes precedence, if not then the usual LDRg. Additionally, when tying in with the change to a list of files, the

Open points:

  1. One change to future proof would be to invert the keymap (instead of ncvar: cloudnetvar to go for cloudnetvar: ncvar) to be able to simply have a list of variables that should be checked in order but this is out of scope for the moment and I navigated this via OrderedDict to ensure the precedence of the h2l variables relevant for the STSR
  2. Which filetype should be preferred/used in the future. I tend towards znc as a first choice and fallback on mmclx as implemented for the path+searching but this is my preference. In our case the znc files also tend to be smaller in terms of size, which may make the processing chain faster for some cases (i.e. when files are being transferred, in a non transfer case, the relevant vars are taken early in the processing so the cloudnet mira ncfile will be of similar size)
  3. Issues that I overlooked/further improvements required?

Pull request at #84 and includes changes from #80

def _miraignorevar(filetype: str) -> dict:
    """Returns the variables to ignore when concatting for the METEK MIRA-35 cloud radar.

    This function return the nc variablenames that should be ignored when
    concatting several files, a requirement needed when a path/list of files
    can be passed in to mira2nc, at the moment (08.2023) only relevant for znc.

    Args:
        filetype: Either znc or mmclx

    Returns:
        Appropriate list of variables to ignore for the file type

    Raises:
        TypeError: Not a valid filetype given, must be string.
        ValueError: Not a known filetype given, must be znc or mmclx

    Examples:
        not meant to be called directly by user

    """
    known_filetypes = ['znc', 'mmclx']
    if type(filetype) != str:
        raise TypeError('Filetype must be string')

    if filetype.lower() not in known_filetypes:
        raise ValueError(f'Filetype must be one of {known_filetypes}')

    keymaps = {'znc': ['DropSize'],
               'mmclx': None
    }

    return keymaps.get(filetype.lower(), 'mmclx')

def _mirakeymap(filetype: str) -> dict:
    """Returns the keymap (ncvariables to cloudnetpy variables) for the METEK MIRA-35 cloud radar.

    This function return the approriate keymap (even for STSR polarimetric
    config) for cloudnetpy to take the appropriate variables from the netCDF
    whether mmclx (old format) or znc (new format).

    Args:
        filetype: Either znc or mmclx

    Returns:
        Appropriate keymap for the file type

    Raises:
        TypeError: Not a valid filetype given, must be string.
        ValueError: Not a known filetype given, must be znc or mmclx

    Examples:
          not meant to be called directly by user

    """
    known_filetypes = ['znc', 'mmclx']
    if type(filetype) != str:
        raise TypeError('Filetype must be string')

    if filetype.lower() not in known_filetypes:
        raise ValueError(f'Filetype must be one of {known_filetypes}')

    from collections import OrderedDict
    # ordered dict here because that way the order is kept, which means
    # we will get Zh2l over as Zh over Zg, which is relevant for the new
    # znc files of an STSR radar
    keymaps = {'znc': OrderedDict([
        ("Zg", "Zh"),
        ("Zh2l", "Zh"),
        ("VELg", "v"),
        ("VELh2l", "v"),
        ("RMSg", "width"),
        ("RMSh2l", "width"),
        ("LDRg", "ldr"),
        ("LDRh2l", "ldr"),
        ("SNRg", "SNR"),
        ("SNRh2l", "SNR"),
        ("elv", "elevation"),
        ("azi", "azimuth_angle"),
        ("aziv", "azimuth_velocity"),
        ("nfft", "nfft"),
        ("nave", "nave"),
        ("prf", "prf"),
        ("rg0", "rg0"),
    ]),
        'mmclx': {
        "Zg": "Zh",
        "VELg": "v",
        "RMSg": "width",
        "LDRg": "ldr",
        "SNRg": "SNR",
        "elv": "elevation",
        "azi": "azimuth_angle",
        "aziv": "azimuth_velocity",
        "nfft": "nfft",
        "nave": "nave",
        "prf": "prf",
        "rg0": "rg0",
    }
    }

    return keymaps.get(filetype.lower(), 'mmclx')

def mira2nc(
    raw_mira: str | list[str],
    output_file: str,
    site_meta: dict,
    uuid: str | None = None,
    date: str | None = None,
) -> str:
    """Converts METEK MIRA-35 cloud radar data into Cloudnet Level 1b netCDF file.

    This function converts raw MIRA file(s) into a much smaller file that
    contains only the relevant data and can be used in further processing
    steps.

    Args:
        raw_mira: Filename of a daily MIRA .mmclx file. Can be also a folder containing
            several non-concatenated .mmclx or .znc files from one day or list of files.
            znc files take precedence because they are the newer filetype
        output_file: Output filename.
        site_meta: Dictionary containing information about the site. Required key
            value pair is `name`.
        uuid: Set specific UUID for the file.
        date: Expected date as YYYY-MM-DD of all profiles in the file.

    Returns:
        UUID of the generated file.

    Raises:
        ValidTimeStampError: No valid timestamps found.

    Examples:
          >>> from cloudnetpy.instruments import mira2nc
          >>> site_meta = {'name': 'Vehmasmaki'}
          >>> mira2nc('raw_radar.mmclx', 'radar.nc', site_meta)
          >>> mira2nc('raw_radar.znc', 'radar.nc', site_meta)
          >>> mira2nc('/one/day/of/mira/mmclx/files/', 'radar.nc', site_meta)
          >>> mira2nc('/one/day/of/mira/znc/files/', 'radar.nc', site_meta)

    """

    with TemporaryDirectory() as temp_dir:
        if isinstance(raw_mira, list) or os.path.isdir(raw_mira):
            # better naming would be concat_filename but to be directly comp-
            # atible with the opening of the output we stick to input_filename
            input_filename = f"{temp_dir}/tmp.mmclx"
            # passed in is a list of files
            if isinstance(raw_mira, list):
                valid_files = sorted(raw_mira)
            else:
                # passed in is a path with potentially files
                valid_files = utils.get_sorted_filenames(raw_mira, ".znc")
                if valid_files:
                    pass
                else:
                    logging.warning(f"No znc files found in {raw_mira},",
                                    " looking for mmclx")
                    valid_files = utils.get_sorted_filenames(raw_mira, ".mmclx")

            if valid_files:
                pass
            else:
                logging.error(f"Neither znc nor mmclx files in path" + \
                              f"{raw_mira}. Please check your input.")
                raise FileNotFoundError(f"Neither znc nor mmclx files found " + \
                                        f"{raw_mira}. Please check your input.")

            valid_files = utils.get_files_with_common_range(valid_files)

            # get unique filetypes
            filetypes = list(set([f.split('.')[-1].lower()
                                  for f in valid_files]))

            if len(filetypes) > 1:
                raise TypeError('mira2nc only supports a singlefile type as input',
                                'either mmclx or znc')

            keymap = _mirakeymap(filetypes[0])

            variables = list(keymap.keys())
            concat_lib.concatenate_files(
                valid_files,
                input_filename,
                variables=variables,
                ignore=_miraignorevar(filetypes[0]),
                allow_difference=["nave", "ovl"],
            )
        else:
            input_filename = raw_mira
            keymap = _mirakeymap(input_filename.split('.')[-1])

        with Mira(input_filename, site_meta) as mira:
            mira.init_data(keymap)
            if date is not None:
                mira.screen_by_date(date)
                mira.date = date.split("-")
            mira.sort_timestamps()
            mira.remove_duplicate_timestamps()
            mira.linear_to_db(("Zh", "ldr", "SNR"))
            mira.screen_by_snr()
            mira.mask_invalid_data()
            mira.add_time_and_range()
            mira.add_site_geolocation()
            mira.add_radar_specific_variables()
            valid_indices = mira.add_zenith_and_azimuth_angles()
            mira.screen_time_indices(valid_indices)
            mira.add_height()
        attributes = output.add_time_attribute(ATTRIBUTES, mira.date)
        output.update_attributes(mira.data, attributes)
        uuid = output.save_level1b(mira, output_file, uuid)
        return uuid

git diff is attached as text file: mirapy_diff.txt

tukiains commented 1 year ago

Hi, thanks for the input! It would be good the get the support for znc files. Will you open a pull request about this or should we just implement your suggestion above? We can refactor the code later if needed. I also think it makes sense to have znc as the first option if you have both formats. Can you send me one znc file so that I can test with it (attach here or email to actris-cloudnet@fmi.fi).

spirrobe commented 1 year ago

Of course, find some example files (raw mira) at https://iacweb.ethz.ch/staff/rspirig/cloudlab/quicklooks/cloudnet/example/ as github has a file size limit of 25 mb