amakelov / mandala

A simple & elegant experiment tracking framework that integrates persistence logic & best practices directly into Python
Apache License 2.0
506 stars 15 forks source link

Content hashing and library versions #2

Closed amakelov closed 1 year ago

amakelov commented 2 years ago

A key property of content hashes is that they are deterministic. This allows you to handle Python objects and automatically arrive at the right UID (hence, storage location) behind the scenes, without having to think or make decisions about names or storage. E.g., a simple example would be

@op()
def f(x) -> int:
    return x + 1

# on Monday...
with run(storage):
    f(23)

# on Tuesday...
with run(storage):
    f(23)

Since the hashing is deterministic, this will correctly figure out that f was already executed with this input.

However a problem can happen in a few ways:

This can be very bad since you could trigger a completely new computation of a pipeline.

This issue's goal is to figure out what our constraints for this are and design a solution. Some very rough possibilities:

amakelov commented 1 year ago

Closing this as out of scope for now