airbnb / knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Apache License 2.0
5.48k stars 688 forks source link

images are not fully committed with repos using LFS #260

Open bmabey opened 7 years ago

bmabey commented 7 years ago

Auto-reviewers: @NiharikaRay arikaRay @matthewwardrop @earthmancash @danfrankj

When I add a post that has git-lfs enabled for images the images added by the knowlege_repo script are not fully committed to the repo. This example will illustrate what I mean and should enable you to reproduce the issue:

First setup a new repo and turn LFS on for PNG images:

knowledge_repo --repo repo_with_lfs init
cd repo_with_lfs/
echo "*.png filter=lfs diff=lfs merge=lfs -text" > .gitattributes
git add .gitattributes
git commit -m 'turns on LFS for pngs'

Then add a post that has an image in it (where KNOWLEDGE_REPO is the tooling repo):

knowledge_repo --repo . add KNOWLEDGE_REPO/tests/test_posts/one_plus_one.ipynb -p examples/one_plus_one -m "example post with image"

This should fully add the post but if you go to the repo and check the status you see:

repo_with_lfs (master *)$ gst
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   examples/one_plus_one.kp/images/output_4_0.png

repo_with_lfs (master *)$ git diff
diff --git a/examples/one_plus_one.kp/images/output_4_0.png b/examples/one_plus_one.kp/images/output_4_0.png
index 1fc9c1e..b59ed35 100644
Binary files a/examples/one_plus_one.kp/images/output_4_0.png and b/examples/one_plus_one.kp/images/output_4_0.png differ

Running git commit -a --am --no-edit fixes the state of the repo but this interrupts the workflow of working with knowledge_repo and I know this isn't the intended behavior.

Is my setup of LFS incorrect or is this a problem with the knowledge-repo?

matthewwardrop commented 7 years ago

Hi @bmabey ,

Thanks for reporting this! Hitherto, we have done no work with LFS repositories, and so it isn't very surprising that there are issues like this. LFS support makes a lot of sense though, especially now the process is a lot more streamlined than once it was.

I'll look into this when I can. I suspect there may be some complexity in supporting this due to limitations in gitpython, but I'll do what I can to resolve or further comment on this by the end of June. If this is a blocker for you, I can try to expedite a fix.

bmabey commented 7 years ago

It isn't a blocker but it may effect my team's opinions of KR during our evaluation period so I'd ideally like to address it in some fashion. Can you think of a workaround, if even a quick hack, that I could try in my fork during our evaluation? I essentially need to run git commit -a --am --no-edit after the current commit logic but I didn't know how to do that with gitpython.

matthewwardrop commented 7 years ago

Sure thing. I'll look into it today. :)

bmabey commented 7 years ago

Any more thoughts on this @matthewwardrop? I'm thinking of just using a simple shell script wrapper to address this problem for now unless you can think of a way of addressing the problem using pygit.