ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

[Task] Add dummy LFS object in template repo #536

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Actually we found out that Git-LFS does not work in GitHub templates, so we need to create a workaround and incorporate the Git-Lfs object afterwards. The goal is to have a dummy git-lfs object in the eos-template so that contributors can fork the repository and push LFS objects (requires the repo to ALREADY have an LFS object)

D1M1TR10S commented 1 year ago

Templates can't have Git LFS objects. Tack this onto the end of the approve workflow.

Git-LFS is a paid feature, but there's a super discount applied. Won't be charged in the foreseeable future.

They need to have the model approved in order to get to this step – for new model submissions.

Problem: For any forks from a repository being uploaded to a repo, the fork will still trigger Git-LFS. Could burn down the entitlements for Git-LFS. => If workflow adds Git-LFS to a repo the org is charged. Can fork the repo and use the repo for whatever they want, and Ersilia will be billed for it. Ask the Git-LFS team about this to see if there's a workaround.

If someone forks and merges, can we delete that repo? => No. The person forking owns it.

Each model repo has a folder called "model" which has checkpoints for the model. They can sometimes be a couple of gigabytes with parameters of the models. Could store this separately in a package, but to keep everything centralized we decided to use Git-LFS. If we offer Git-LFS in public repos they can be abused.

Next steps:

GemmaTuron commented 1 year ago

We realized the eos-template was failing to pass the test because it needed .joblib file that was git-lfs tracked (and it was not there due to the issue described above). We have temporarily removed the requirement for joblib in the template (https://github.com/ersilia-os/eos-template/commit/1dbd0a6db2371698424dba39d29f6e2b84891c06) and now when a new repo is created the test does not fail (https://github.com/ersilia-os/eos1nxr/actions/runs/3910251818/jobs/6682232637)

Next actions:

@GrantBirki and @megamanics let me know how this sounds

vtbassmatt commented 1 year ago

Problem: For any forks from a repository being uploaded to a repo, the fork will still trigger Git-LFS. Could burn down the entitlements for Git-LFS. => If workflow adds Git-LFS to a repo the org is charged. Can fork the repo and use the repo for whatever they want, and Ersilia will be billed for it. Ask the Git-LFS team about this to see if there's a workaround.

If someone forks and merges, can we delete that repo? => No. The person forking owns it.

If someone starts abusing your LFS entitlement, Support has tools which can help. The bad fork can be forcibly detached and the orphan objects in your network can be pruned.

D1M1TR10S commented 1 year ago

Adding objects tracked in git LFS Can we make an actions workflow that's triggered when a repo is created from the template? => Yes. Should be triggered by the on: push to main.

Add job to the /approve workflow in the eos template (here)

GemmaTuron commented 1 year ago

Add an action in: https://github.com/ersilia-os/ersilia/blob/master/.github/workflows/approve-dispatch.yml after line 66

D1M1TR10S commented 1 year ago

Is it possible to detach the fork from the parent organization so they're aren't consuming Ersilia's Git LFS entitlements whenever someone forks a repo. By default there's 50 GB free, and no model should be that big.

vtbassmatt commented 1 year ago

GitHub Support can help you detach a repo if needed.

Also, heads up: the free entitlement is 1GB, not 50GB.

GemmaTuron commented 1 year ago

Thanks @vtbassmatt !

@GrantBirki we have tried to add the dummy lfs file as discussed in the approve-dispatch workflow but we are unable to make the commit to the repo because we don't have permissions. We are trying to run it from ersilia-bot with the https://github.com/actions-js/push but it seems we are unable to reset the author to our bot: remote: Permission to ersilia-os/eos5iod.git denied to github-actions[bot].

Do you have any suggestion?

miquelduranfrigola commented 1 year ago

Hi all!

Update: we have managed to make it work. The workflow is probably not very clean (we are just learning...) but it does what we expect it to do now. In brief, it tracks as mock.csv file with LFS and then it commits the changes to the newly created repo.

As an example you can see this repository: https://github.com/ersilia-os/eos5ixa

We now need to check that this behaves as expected when users fork the repo and track new files with Git LFS. We will keep you posted!

miquelduranfrigola commented 1 year ago

When I fork to my username: https://github.com/miquelduranfrigola/eos5ixa I am indeed able to add new Git LFS objects. So it looks very promising! We will ask someone who is not an ersilia-os owner to try it.