Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
It includes a few changes to the MetadataStore base class and sets a clearer contract for .get_run() and .get_run_ids().
.get_run_ids() should return a list of runs sorted by their start time, i.e., when .initialize() is called. This order may differ from the order of the first recorded node execution. An empty list will be returned if no run were initialized.
.get_run() should return a list of dictionaries, where each dict is associated with a particular node execution. Each dict must minimally return the data_version and the cache_key (which can be decoded to retrieve the node_name, code_version, and dependencies_data_versions), but may include more information. An IndexError is raised if run_id doesn't exist. An empty list is returned if the run was initialized, but recorded no node execution.
Tests were added to catch the bug reported in the original issue.
This PR follows issue #1204
It includes a few changes to the
MetadataStore
base class and sets a clearer contract for.get_run()
and.get_run_ids()
..get_run_ids()
should return a list of runs sorted by their start time, i.e., when.initialize()
is called. This order may differ from the order of the first recorded node execution. An empty list will be returned if no run were initialized..get_run()
should return a list of dictionaries, where eachdict
is associated with a particular node execution. Eachdict
must minimally return thedata_version
and thecache_key
(which can be decoded to retrieve thenode_name
,code_version
, anddependencies_data_versions
), but may include more information. AnIndexError
is raised ifrun_id
doesn't exist. An empty list is returned if the run was initialized, but recorded no node execution.Tests were added to catch the bug reported in the original issue.