Closed maxmynter closed 7 months ago
Attention: 1 lines
in your changes are missing coverage. Please review.
Comparison is base (
bcb7008
) 87.36% compared to head (eb5b29c
) 88.04%. Report is 2 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
src/lakefs_spec/client_helpers.py | 87.50% | 0 Missing and 1 partial :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
W.r.t. to tests, I am unsure on how to test the list_branches
client helper as we basically only return the lakeFS-API response. Plus, if it breaks, it would also break the delete_branch
test.
Thoughts on this?
W.r.t. to tests, I am unsure on how to test the
list_branches
client helper as we basically only return the lakeFS-API response. Plus, if it breaks, it would also break thedelete_branch
test.Thoughts on this?
If you want to make sure we're calling the right lakeFS API with the expected params, you could mock the API client, other than that, not much we can test there.
@nicholasjng I am not sure that I understand your Review comment above, so correct my understanding when it is wrong.
But isn't reproducibility given by specifying commit SHAs? And tags are a human understandable aide. E.g. lakeFS gives the example of dev:jane-before-v2.3-merge
.
Then it would be the individual data scientists responsibility to choose a unique and suitable tag. Once a tag is assigned, it should not be implicitly reassigned as a side effect to other code execution and only happen deliberately.
But isn't reproducibility given by specifying commit SHAs? And tags are a human understandable aide. E.g. lakeFS gives the example of
dev:jane-before-v2.3-merge
.
Technically yes, but that is "post-hoc reproducibility": I was wondering why, with our current notebook setup, new commit SHAs are even needed at all. (I get the technical reasoning)
Is there an easy setup using branches that does not create a new commit anywhere? Probably branching off of main again for the second training after 2020 data has been pushed to main, right?
Then it would be the individual data scientists responsibility to choose a unique and suitable tag. Once a tag is assigned, it should not be implicitly reassigned as a side effect to other code execution and only happen deliberately.
Fair.
Is there an easy setup using branches that does not create a new commit anywhere? Probably branching off of main again for the second training after 2020 data has been pushed to main, right?
Yes that would work, I think.
What I thought about when writing the initial demo storyline, was to create a scenario where the tags has its most value.
That is, imo, that we can tag a commit that is not the most recent on a branch. For the most recent, just the Ref
to the branch suffices, whereas you need the commit id for older commits. This is easier done with a tag.
The only way to tag something that is unchanged is to have a single commit on a branch. This way, pushing files does not create a new commit, the tag refers to the old one and we get idempotent notebook execution without a hidden cleanup cell.
Quick aside: I just saw that create_tag
takes str | Commit
. It could take any Ref
, also to a current branch tip. Should we extend this?
Quick aside: I just saw that
create_tag
takesstr | Commit
. It could take anyRef
, also to a current branch tip. Should we extend this?
It's probably low priority, since I assume that the Ref
would almost always be a branch API result (or would you want to tag a tag?), and then you could always use the name directly.
Closes https://github.com/aai-institute/lakefs-spec/issues/185 Closes https://github.com/aai-institute/lakefs-spec/issues/191
Replaces https://github.com/aai-institute/lakefs-spec/pull/186 to build on
Jupytext
integration