Closed earlephilhower closed 5 years ago
Just did some proper, multi-run testing, and the PR is now properly filtering out .git from its scanning. Runtimes of 1:45 (on a 16-core Xeon server!) and memory usage of 4GB+ are now almost instantaneous and much, much lower RAM usage.
Thanks alot @earlephilhower
Sounds like a good change, but I have a few questions and remarks:
src
subdirectory? Do you have .git
directories in there? I'm just trying to understand the problem here.FindAllFilesInFolder
skip hidden files might be a bit surprising for users of the function (it's not really returning all files anymore). Perhaps passing a list of files or patterns to ignore explicitly might be better?.
, but IIRC there's already a list of files to ignore somewhere in the builder code (IIRC it's used to not warn about these files when they show up in a library or something). Perhaps that list could be reused instead?Good notes. I think I can clear up most of them in reverse order of difficulty. :)
Where is this code actually used? I would expect in recursive scanning of scketches and libraries, but doesn't that only happen inside the src subdirectory? Do you have .git directories in there? I'm just trying to understand the problem here.
@d-a-v and I are maintainers of https://github.com/esp8266/Arduino/ , so we use ~/Arduino/hardware/esp8266com/esp8266
as the Arduino code and git repository we work from (branches, checking PRs, etc.) So there is a .git
dir in the local hardware tree in our cases. It has something like 3000 commits and dozens of branches, so the .git
dir has many, many files.
For normal users, though, your point is valid and they shouldn't have a .git
in boards-manager installed tree. This change, for them, should be a no-op.
The actual code change itself is called from the wipeout_build_path...
stage where arduino-builder
is only trying to see if platform.txt
has been changed (or other files with .txt
extensions). Without this change, the findAllFilesInFolder
does a massive amount of recursion and basically stats the entire GIT tree of 100K files or more, and stores that in memory only to be thrown away on return by the simple file.Ext()==".txt"
iteration.
You're now ignoring all files starting with ., but IIRC there's already a list of files to ignore somewhere in the builder code (IIRC it's used to not warn about these files when they show up in a library or something). Perhaps that list could be reused instead?
The problem is any ignore list is only processed after the file list is in memory. It's not that there are some txt
files in the git tree which are causing false rebuilds, it's that the builder is taking multiple minutes and GB scanning the GIT tree to build a really massive list of objects only to throw them out. We're trying to short-circuit that.
Making a low-level function like
FindAllFilesInFolder
skip hidden files might be a bit surprising for users of the function (it's not really returning all files anymore). Perhaps passing a list of files or patterns to ignore explicitly might be better?
You have a point, but I'm not sure it's a valid use case. FindAllFilesInFolder
in the arduino-builder
context is really looking for inputs to the build process (at multiple stages). I think it could be argued that if your build source files are contained in a hidden directory that you've got some issues, or at least the next guy to manage your code will have them! :)
I think it might be better to have a Bool
flag for ignoring hidden directories (as opposed to an ignore list since there are also code coverage tools and others which make hidden dirs to store their artifacts), but the impact of such a change would be significantly larger in terms of code changes (but 0 logic changes, of course) since wipe...
actually has 2 levels of indirection before this is called.
Alternatively, wipe...
could be rewritten to be more intelligent about the parsing. Now it's doing the equivalent of select * from core-tree;
and then dropping everything but .txt
files. There should be a way to express select * from core-tree where extension='.txt';
. I'm not sure my Go skills are up to that, but if that's what you'd like I could give it a shot.
Looks like this patch was implemented in the replacement arduino-cli:
https://github.com/arduino/arduino-cli/blob/master/arduino/builder/sketch.go#L91
Closing since it's no longer relevant.
When checking for updated core files, don't scan inside UNIX hidden directories like .git. Scanning inside a GIT tree for a large project with many branches could take many minutes of runtime and multiple GB of RAM. This was seen in the Arduino core for the ESP8266, but is probably also an issue for other core development teams.
Fixes #327
Signed-off-by: Earle F. Philhower, III earlephilhower@yahoo.com