aburrell / ocbpy

Convert between magnetic and adaptive, polar boundary coordinates
BSD 3-Clause "New" or "Revised" License
7 stars 7 forks source link

Improve pysat custom function #114

Closed aburrell closed 1 year ago

aburrell commented 1 year ago

Describe the bug The pysat custom function uses for loops instead of pandas/xarray functionality and this makes processing some datasets (like the Madrigal VTEC) too slow to be useful.

To Reproduce Steps to reproduce the behavior:

import datetime as dt
import numpy as np
import os

import aacgmv2
import ocbpy
import pysat
import pysatMadrigal as pymad

stime = dt.datetime(2001, 1, 21)
etime = dt.datetime(2002, 1, 1)
inst_id = 'bounds'
hemisphere = 1
username = "Your Name"
password = "your@email"

def add_mag_coords(inst, lat='gdlat', lon='glon', alt='gdalt'):
    """Add AACGMV2 magnetic coordinates.

    Parameters
    ----------
    inst : pysat.Instrument
        Data object
    lat : str
        Geodetic latitude key (default='gdlat')
    lon : str
        Geographic longitude key (default='glon')
    alt : str
        Geodetic altitude key (default='gdalt')
    """
    # Initalize the data arrays
    mlat = np.full(shape=tuple(inst.data.dims[kk]
                               for kk in ['time', lat, lon]),
                   fill_value=np.nan)
    mlt = np.full(shape=mlat.shape, fill_value=np.nan)

    # Cycle through all times, calculating magnetic locations
    for i, dtime in enumerate(inst.index):
        for j, gdlat in enumerate(inst[lat].values):
            if alt in inst.variables:
                height = inst[alt][i, j].values
            else:
                height = 350.0  # Default altitude is 350 km

            if not np.isnan(height).all():
                mlat[i, j], mlon, r = aacgmv2.convert_latlon_arr(
                    gdlat, inst[lon].values, height, dtime)
                mlt[i, j] = aacgmv2.convert_mlt(mlon, dtime)

    # Assign the magnetic data to the input Instrument
    inst.data = inst.data.assign({"mlat": (("time", lat, lon), mlat),
                                  "mlt": (("time", lat, lon), mlt)})
    print("Added magnetic coordinates")
    return

def add_ocb_coords(inst, ocb, ocb_suffix='ocb', mlat_name='mlat',
                   mlt_name='mlt', max_sdiff=60):
    """Add OCB coordinates with a custom suffix to a pysat Instrument.

    Parameters
    ----------
    inst : pysat.Instrument
        Pysat Instrument object
    ocb : ocbpy.OCBoundary, ocbpy.EABoundary, or ocbpy.DualBoundary object
        ocbpy Boundary object
    ocb_suffix : str
        Desired suffix for OCB output parameters (default='ocb')
    mlat_name : str
        Variable name for magnetic latitude (default='mlat')
    max_sdiff : int
        Maximum time difference in seconds (default=60)

    """
    # Add the OCB coordinates with standard labels
    ocbpy.instruments.pysat_instruments.add_ocb_to_data(inst, ocb=ocb,
                                                        mlat_name=mlat_name,
                                                        mlt_name=mlt_name,
                                                        max_sdiff=max_sdiff)

    # Rename the output data
    if ocb_suffix != 'ocb':
        ocb_vars = [var for var in inst.variables if var.find('ocb') > 0]
        new_vars = {var: var.replace('ocb', ocb_suffix) for var in ocb_vars}
        inst.rename(new_vars)
    print("Added {:} coordinates".format(ocb_suffix))

    return

tec = pysat.Instrument(inst_module=pymad.instruments.gnss_tec, tag='vtec',
                           user=username, password=password)
dual = ocbpy.DualBoundary(stime=stime, etime=etime, hemisphere=hemisphere)

tec.custom_attach(add_mag_coords)
tec.custom_attach(add_ocb_coords, kwargs={'ocb': dual, 'ocb_suffix': 'dual'})

dual.rec_ind = 0
tec.download(start=stime)
tec.load(date=stime)  # This will take a VERY long time

Expected behavior The function should work on this data set in a way that is useful

Desktop (please complete the following information):

Additional context The problem is the for-loops in ocbpy.instruments.pysat_instrument.add_ocb_to_data. The function is working correctly, it's just too slow to be used.

aburrell commented 1 year ago

@jklenzing, this is the project I was talking about with you. Would appreciate your input!

aburrell commented 1 year ago

Looked into this and found a step that reduced test run time from about 4 hours to about 40 minutes. Explored more xarray/pandas methods and they didn't seem appropriate, so this may close with #115