DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://hamilton.dagworks.io/en/latest/
BSD 3-Clause Clear License
1.79k stars 114 forks source link

Mutate (and likely steps too) code isn't shown in Hamilton UI #1176

Open skrawcz opened 3 hours ago

skrawcz commented 3 hours ago

Current behavior

Running mutate decorator (and likely step/pipe) with the Hamilton Tracker shows the correct DAG structure, but no code is attached.

Screenshots

Screen Shot 2024-10-11 at 4 18 55 PM

Steps to replicate behavior

from hamilton.function_modifiers import mutate
import pandas as pd

def transformed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    return ... # do your regular stuff here

# turns initial_data into initial_data + 1
@mutate(transformed_data)
def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    for column in df.columns:
        df[column] = (df[column]-df[column].min())/(df[column].max() - df[column].min())
    return df

@mutate(transformed_data, outlier_threshold=10)
def _remove_outliers(df: pd.DataFrame, outlier_threshold: float) -> pd.DataFrame:
    return df[df < outlier_threshold]

driver:

from hamilton_sdk import adapters
from hamilton import driver

tracker = adapters.HamiltonTracker(
   project_id=...,  # modify this as needed
   username="...",
   dag_name="mutate_example",
   tags={"environment": "DEV", "team": "MY_TEAM", "version": "mutate"},

)
dr = (
  driver.Builder()
    .with_config({})
    .with_modules(mutate_example)
    .with_adapters(tracker)
    .build()
)

Library & System Information

Latest python, SDK, & hamilton UI versions

Expected behavior

That the code shows up.

Additional context

Guess - we're not attaching the source code appropriately or referencing it correctly for the UI to show it.

elijahbenizzy commented 2 hours ago

The problem here is the originating functions... @mutate isn't attached to the function and we don't collect the references. We may want to:

  1. Add to originating functions
  2. Create a auxiliary_functions variable to store the attached ones