AI-multimodal / aimmdb

BSD 3-Clause "New" or "Revised" License
0 stars 10 forks source link

Build the ingestion pipeline #42

Open matthewcarbone opened 1 year ago

matthewcarbone commented 1 year ago

Building the ingestion pipeline

We are working with Eli to develop a pipeline for uploading his XAS beam line data into aimmdb. Particularly, we want to accomplish the following with this issue:

Summary

import numpy as np
import pandas as pd

MEASUREMENT_INSTRUCTIONS = {
    "transmission": {
        "name": "transmission",
        "numerator": "it",
        "denominator": "i0",
        "log": True,
        "invert": True,
        "col_name": "mu_trans",
    },
    "fluorescence": {
        "name": "fluorescence",
        "numerator": "iff",
        "denominator": "i0",
        "log": False,
        "invert": False,
        "col_name": "mu_fluo",
    },
}

def extract_mu(path, measurement_kind):

    df = pd.read_csv(path)

    measurement_description = MEASUREMENT_INSTRUCTIONS[measurement_kind]

    energy = df["energy"]

    mu = (
        df[measurement_description["numerator"]]
        / df[measurement_description["denominator"]]
    )

    if measurement_description["log"]:
        mu = np.log10(mu)

    if measurement_description["invert"]:
        mu = -mu

    # Also read the metadata from the file, include all commented lines, but
    # we need to pick out the particularly important databroker unique id
    metadata = ...

    # process data frame...

    return df, metadata

Specific steps