SwissDataScienceCenter / renku

Renku provides a platform and tools for reproducible and collaborative data analysis.
https://renkulab.io
Apache License 2.0
224 stars 34 forks source link

repository broken after adding data from server #1218

Closed gitanjalIthakur closed 4 years ago

gitanjalIthakur commented 4 years ago

Hi , please find the link of my repository here https://renkulab.io/projects/gitanjali.thakur/copernicus_lst_testing i added data using renku run using wget "renku run wget -r --ftp-user=gitanjali22 --ftp-password='***' ftp://ftp.copernicus.vgt.vito.be//M0070737/" from an external server.Once the download was over i was getting list of untracked files. i was not sure if the downloaded data was under LFS or not therefore i tried to remove the downloaded data and now once i connect to the environments i am not able to see any files in my repository. i will appreciate your help. Many Thanks

schymans commented 4 years ago

I tried to create environments based on different commits of this repo, up to two weeks old, but they are all based on empty folders. Which was the last commit on which you were able to create a working environment, do you remember?

gitanjalIthakur commented 4 years ago

yes today morning after the download of data i was able to create the environment and it was working fine. The repo broke once i removed the directory containing downloaded data

rcnijzink commented 4 years ago

So I managed to clone with:

git clone https://renkulab.io/gitlab/gitanjali.thakur/copernicus_lst_testing.git --depth 1

Later I also managed with a depth of 50. I think the history was still big:

 work ❯ copernicus_lst_testing ▶ master ▶ $ ▶ git lfs migrate info --include-ref=master
migrate: Sorting commits: ..., done                                                                   
migrate: Examining commits: 100% (39/39), done                                                        
*.nc            6.3 GB  1526/1526 files(s)      100%
*.xml           53 MB   1523/1523 files(s)      100%
*.ipynb         15 MB       11/11 files(s)      100%
*.yml           59 KB       12/12 files(s)      100%
*.gitignore     6.0 KB        1/1 files(s)      100%

So I added the nc-files to git lfs:

git lfs migrate import --include="*.nc" --include-ref=master

And now it should work again ;)

schymans commented 4 years ago

So this is just a matter of the repo size? It would be great if such problems could be picked up in the course of building an environment and if the user could get a warning. Thanks for showing us how to solve it, @rcnijzink

rokroskar commented 4 years ago

It's a matter of committing large (or many) binary files to the git repository.

We have implemented some hooks that should prevent you from ever committing large files to the git repo. Please make sure you use renku-python 0.10.4. If you use renku clone the git hooks will be automatically installed; if the repo is cloned with git clone then you can run renku githooks install to install them. If you are concerned about a repo, you can run renku doctor that will alert you to these common problems.

You can find the git-lfs configuration options described here.

rokroskar commented 4 years ago

@gitanjalIthakur @schymans @rcnijzink has this been resolved? Can we close the issue?

rcnijzink commented 4 years ago

This can be closed yes! Thanks!