AI-multimodal / aimmdb

BSD 3-Clause "New" or "Revised" License
0 stars 10 forks source link

Replace the "normalization" scheme with Larch #43

Open matthewcarbone opened 1 year ago

matthewcarbone commented 1 year ago

At Eli's request we will be replacing our postprocessing operators used for normalization with the scheme that is used in Larch. For example,

from larch import Group as xafsgroup
from larch.xafs import pre_edge, autobk, mback, xftf
from larch import Interpreter

a = xafsgroup()
a.mu = np.array(mu)
a.energy = np.array(energy)
pre_edge(a, group=a, _larch=_larch)

def flatten(group):
    step_index = int(np.argwhere(group.energy > group.e0)[0])
    zeros = np.zeros(step_index)
    ones = np.ones(group.energy.shape[0] - step_index)
    step = np.concatenate((zeros, ones), axis=0)
    diffline = (group.post_edge - group.pre_edge) / group.edge_step
    group.flat = group.norm + step * (1 - diffline)

flatten(a)
a.energy  # x-axis
a.flat  y-axis

@zleung9 FYI. I'll be working on this a bit with one of Eli's students.

CharlesC30 commented 1 year ago

@x94carbone I added the following class to the operations module. Is there a good way for me to test that it works?

class NormalizeXAS(UnaryOperator):
    """Return XAS spectrum normalized using larch.

    Parameters
    ----------
    x_column : str, optional
        References a single column in the DataFrameClient (this is the
        "x-axis").
    y_columns : list, optional
        References a list of columns in the DataFrameClient (these are the
        "y-axes").
    """

    def __init__(self, *, x_column="energy", y_columns=["mu"]):
        self.x_column = x_column
        self.y_columns = y_columns

    @staticmethod
    def flatten(group):
        step_index = int(np.argwhere(group.energy > group.e0)[0])
        zeros = np.zeros(step_index)
        ones = np.ones(group.energy.shape[0] - step_index)
        step = np.concatenate((zeros, ones), axis=0)
        diffline = (group.post_edge - group.pre_edge) / group.edge_step
        group.flat = group.norm + step * (1 - diffline)

    def _process_data(self, df):
        new_data = df[self.x_column]
        for column in self.y_columns:
            larch_group = xafsgroup()
            larch_group.energy = np.array(df[self.x_column])
            larch_group.mu = np.array(column)
            self.flatten(larch_group)
            norm_mu = larch_group.flat
            new_data[column] = norm_mu

        return new_data
matthewcarbone commented 1 year ago

@CharlesC30 yep you can test it using the __call__ method. Basically that will take the actual data frame and the metadata as input and it should output the new data and metadata. There are some notebooks in the notebooks directory on the dev-aimm-postprocessing branch that should provide some examples of how to do this!

CharlesC30 commented 1 year ago

@x94carbone Thanks! I just tested the class and made some changes to get it working. I also had to make some changes to the notebook to get it working on my end. For example here I was receiving a key error in the first line since the uid was not present, and later in the last line since the df columns contained the currents (i0, itrans, etc.) but not mu.

node = CLIENT["uid"]["THBo5gy9cN8"]
df = node.read()
energy = df["energy"]
mutrans = df["mutrans"]

so I changed it to

node = CLIENT["uid"]["Bt5hUbgkfzR"]
df = node.read()

import numpy as np
df["mutrans"] = -np.log(df["itrans"]/df['i0'])

energy = df["energy"]
mutrans = df["mutrans"]

Since it looks like the class is working fine I will submit a PR soon, then we could decide if we want to keep any changes to the notebook.

matthewcarbone commented 1 year ago

@CharlesC30 perfect, thank you! That sounds great. Open a PR when ready!