Skaffold dev using 100% sustained CPU after merging #620

d11wtq commented 6 years ago

My projects are nodejs projects and have large (> 1GB) directory trees that are added to the Docker images.

Since #620 got merged, I'm seeing CPU usage immediately hit 100% in skaffold dev. I think the mtime polling is costly on large projects?

Note that previously I had set: SKAFFOLD_FILE_WATCHER=fsnotify explicitly.

Note also that these large node_modules trees are under .dockerignore, though they would be in the path referenced by the COPY command in the Dockerfile.

Expected behavior

Skaffold dev should not use that much CPU.

Actual behavior

Skaffold dev is using 100% CPU.

Information

Skaffold version: Git commit 2d1d68df21da44382f9a57fc5419c37037734372 (I've now switched back to 2d0ee1e which doesn't have this issue)
Operating system: MacOS 10.12.16

Steps to reproduce the behavior

Start skaffold dev with a large directory referenced in a COPY command in a Dockerfile.
Observe immediate 100% CPU usage.

d11wtq commented 6 years ago

I will have to revert back to an earlier commit for now as this renders my MacBook largely unusable due to the fact I run multiple skaffold dev processes to facilitate development of a microservice-based architecture.

d11wtq commented 6 years ago

With regards to the dislike of fsnotify, I don't understand how other projects are able to use it extremely efficiently (e.g. nodemon, babel, webpack, relay). My understanding is that it can watch an entire directory without requiring watching each individual file in the directory?

dgageot commented 6 years ago

@d11wtq sorry to hear that. I'm going to take a look today. I see a couple of things I might do:

Detect or specify that those large folders shouldn't be watched
Increase/make configurable the polling delay
See if something can be improved on the mtime loop

You are right that fsnotify can be made to work but it also has some restrictions

d11wtq commented 6 years ago

Just reading up on fsnotify and I'm wrong about those other projects. They use fsevents on MacOS which does allow recursive directory watches efficiently, but fsnotify doesn't currently use that. There is a (somewhat unmaintained) package for that, however.

I wonder if we could perhaps abstract the watcher implementation as there are completely standalone tools that can do arbitrary things based on watching the filesystem for changes. For example, it would be possibly to signal the running skaffold process to trigger a re-deploy based on some external event.

EDIT | I could likely hack that together with a combination of skaffold run, stern and fswatch (changes invoke skaffold run && stern <some flag>).

bhack commented 6 years ago

It is an old thread. Check https://github.com/fsnotify/fsnotify/issues/18

bhack commented 6 years ago

Check also https://github.com/dc0d/dirwatch

bhack commented 6 years ago

A similar directory tree monitoring in kubernetes https://github.com/kubernetes/kubernetes/pull/64660

d11wtq commented 6 years ago

Any progress on this? For now I've switched to a custom solution using fswatch as described above, but I'm keen to avoid the need for reinventing the wheel if I can. Would you consider optionally using https://github.com/fsnotify/fsevents where the platform supports it? Not sure about https://github.com/tywkeene/go-fsevents - it might be a Linux only thing since it seems to be based on inotify.

dgageot commented 6 years ago

@d11wtq I think there's much room to improve of poll based solution. That's what I'm trying to do with https://github.com/GoogleContainerTools/skaffold/pull/833 and https://github.com/GoogleContainerTools/skaffold/pull/837

If I understand your issue correctly, the current solution is watching too many files. By watching only the files that are needed by docker build, it should be much better.

d11wtq commented 6 years ago

@dgageot nice. I'm keen to give that a spin.

bsuh commented 6 years ago

@dgageot I took #833 for a spin and it's still using 100% CPU :/

dgageot commented 6 years ago

@bsuh not cool. Can you explain the layout of you project?

bsuh commented 6 years ago

ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help                                                                                               
--- /Users/bsuh/kubernetes ----------------------------------------------------------------------------------------------------------------
    1.4 GiB [##########] /student                                                                                                                          
  314.6 MiB [##        ] /.git
  124.3 MiB [          ] /shell-services-core
   61.2 MiB [          ] /medialibrary-core
   53.6 MiB [          ] /portal
  388.0 KiB [          ] /testmonitor-services-core
  144.0 KiB [          ]  Kubernetes Pipeline.png
  104.0 KiB [          ] /transformed
   64.0 KiB [          ] /resources
   36.0 KiB [          ] /kustomize
    4.0 KiB [          ]  .gitlab-ci.yml
    4.0 KiB [          ]  README.md
    4.0 KiB [          ]  insert-vars.sh
    4.0 KiB [          ]  skaffold.yaml
    4.0 KiB [          ]  .gitmodules
    4.0 KiB [          ]  rollout-status.sh
    4.0 KiB [          ]  skaffold.yaml~
    4.0 KiB [          ]  .gitignore
e   0.0   B [          ] /.vagrant

 Total disk usage:   1.9 GiB  Apparent size:   1.6 GiB  Items: 114132

It follows the microservices example. 4 backend projects and 1 frontend project as submodules. Each project has .dockerignore for the obvious suspects like node_modules and bin/ obj/ directories.

94543 of those files come from node_modules so the projects themselves aren't unreasonably large.

bsuh commented 6 years ago

@dgageot False alarm. I actually didn't realize that calling make install from your fork was still importing packages from the github.com/GoogleContainerTools/skaffold in my go workspace. I symlinked the fork to the go workspace and rebuilt and now it's taking 15% CPU which is much better. Sorry about that.

dgageot commented 6 years ago

That’s very good news! Thanks for the update. Let’s try to make that even better in the future

bsuh commented 6 years ago

@dgageot What do you think of moving filtering & directory walking concerns from getting artifact dependencies to watch? It seems like those are in the getting artifact dependencies code to accommodate the polling watcher. If getting artifact dependencies instead return a list of directories and filter patterns and leaves it up to the watcher, we can provide a second watcher making use of recursive directory monitoring on operating systems that provide the functionality using https://godoc.org/github.com/rjeczalik/notify

dgageot commented 6 years ago

I'm going to close this one. Feel free to open another ticket for additional improvements

GoogleContainerTools / skaffold