DagsHub / fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc
http://fastds.io
MIT License
382 stars 22 forks source link

fds add takes a long time in a large repo #66

Closed guysmoilov closed 3 years ago

guysmoilov commented 3 years ago

In this repo, it took 1:40 minutes to fds add .: https://dagshub.com/nirbarazida/Pneumonia-Classification/src/image-processing

To recreate, try to:

  1. git clone the above repo
  2. Switch to the image-processing branch
  3. dvc pull
  4. rm -rf .dvc
  5. fds init
  6. fds add .

I suspect the problem is that we keep recalculating the folder sizes during the interactive fds add wizard, we probably should calculate it once and cache it.

Reported by @nirbarazida