get-woke / woke

Detect non-inclusive language in your source code.
https://docs.getwoke.tech
MIT License
457 stars 61 forks source link

Performance issues when compiling ignores and .wokeignore not being applied #239

Open pgtruong opened 2 years ago

pgtruong commented 2 years ago

Thank you for creating the issue!

Please include the following information:

Version of woke ```console $ woke --version # woke version 0.19.0 ```
Config file No config file.
Go environment Some characters omitted with *. ```console $ go version && go env T:\****\Tools\Woke>go version && go env go version go1.19.1 windows/amd64 set GO111MODULE= set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\****\AppData\Local\go-build set GOENV=C:\Users\****\AppData\Roaming\go\env set GOEXE=.exe set GOEXPERIMENT= set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOINSECURE= set GOMODCACHE=C:\Users\****\go\pkg\mod set GONOPROXY=*.****.com set GONOSUMDB=*.****.com set GOOS=windows set GOPATH=C:\Users\****\go set GOPRIVATE=*.****.com set GOPROXY=https://proxy.golang.org,direct set GOROOT=C:\Program Files\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64 set GOVCS= set GOVERSION=go1.19.1 set GCCGO=gccgo set GOAMD64=v1 set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD=NUL set GOWORK= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=C:\Users\****\AppData\Local\Temp\go-build4222856944=/tmp/go-build -gno-record-gcc-switches ```

When running woke at the root of a large repository, it takes substantially longer to run due to searching for ignore files.

Running woke at the root of my repository (some text omitted with *):

T:\****>Tools\Woke\windows\woke.exe T:\****\Tools\Woke\windows\test.txt --debug
2022-10-18T13:12:39-07:00 DBG woke version 0.19.0 built from e588a3e on 2022-07-28T22:46:26Z
2022-10-18T13:12:39-07:00 DBG Adding custom ruleset from filename="T:\\****\\.woke.yaml"
2022-10-18T13:12:39-07:00 DBG loaded config file config="T:\\****\\.woke.yaml"
2022-10-18T13:12:39-07:00 DBG config rules rules=["blacklist","dummy","fluffer","grandfather","male/female connector","man-hours","master","master-slave","slave","whitelist"]
2022-10-18T13:12:39-07:00 DBG default rules rules=["whitelist","blacklist","master-slave","slave","grandfathered","man-hours","sanity","dummy","guys","whitebox","blackbox"]
2022-10-18T13:12:39-07:00 DBG all enabled rules rules=["blacklist","dummy","fluffer","grandfather","male/female connector","man-hours","master","master-slave","slave","whitelist","grandfathered","sanity","guys","whitebox","blackbox"]
2022-10-18T13:12:39-07:00 DBG Could Not Find Root Git Folder
2022-10-18T13:13:10-07:00 DBG finished compiling ignores durationMS=30881.2372
2022-10-18T13:13:10-07:00 DBG created new printer printer=text
2022-10-18T13:13:10-07:00 DBG process files path="T:\\****\\Tools\\Woke\\windows\\test.txt" type=parallel
2022-10-18T13:13:10-07:00 DBG finished processing findings durationMS=0.5496 file=T:/****/Tools/Woke/windows/test.txt
T:/****/Tools/Woke/windows/test.txt:1:0-5: `slave` may be insensitive, use `follower`, `replica`, `standby`, `secondary`, `worker`, `passive`, `child`, `agent`, `node`, `helper`, `responder`, `subscriber` instead (error)
slave
^
T:/****/Tools/Woke/windows/test.txt:2:0-6: `master` may be insensitive, use `primary`, `main`, `parent`, `leader`, `central`, `active` instead (error)
master
^
2022-10-18T13:13:10-07:00 DBG woke completed durationMS=30896.5485

vs running inside the some nested folders:

T:\****\Tools\Woke>windows\woke.exe T:\****\Tools\Woke\windows\test.txt --debug
2022-10-18T13:15:17-07:00 DBG woke version 0.19.0 built from e588a3e on 2022-07-28T22:46:26Z
2022-10-18T13:15:17-07:00 DBG Adding custom ruleset from filename="T:\\****\\Tools\\Woke\\.woke.yaml"
2022-10-18T13:15:17-07:00 DBG loaded config file config="T:\\****\\Tools\\Woke\\.woke.yaml"
2022-10-18T13:15:17-07:00 DBG config rules rules=["blacklist","dummy","fluffer","grandfather","male/female connector","man-hours","master","master-slave","slave","whitelist"]
2022-10-18T13:15:17-07:00 DBG default rules rules=["whitelist","blacklist","master-slave","slave","grandfathered","man-hours","sanity","dummy","guys","whitebox","blackbox"]
2022-10-18T13:15:17-07:00 DBG all enabled rules rules=["blacklist","dummy","fluffer","grandfather","male/female connector","man-hours","master","master-slave","slave","whitelist","grandfathered","sanity","guys","whitebox","blackbox"]
2022-10-18T13:15:17-07:00 DBG Could Not Find Root Git Folder
2022-10-18T13:15:17-07:00 DBG finished compiling ignores durationMS=0.5214
2022-10-18T13:15:17-07:00 DBG created new printer printer=text
2022-10-18T13:15:17-07:00 DBG process files path="T:\\****\\Tools\\Woke\\windows\\test.txt" type=parallel
2022-10-18T13:15:17-07:00 DBG finished processing findings durationMS=0.5231 file=T:/****/Tools/Woke/windows/test.txt
T:/****/Tools/Woke/windows/test.txt:1:0-5: `slave` may be insensitive, use `follower`, `replica`, `standby`, `secondary`, `worker`, `passive`, `child`, `agent`, `node`, `helper`, `responder`, `subscriber` instead (error)
slave
^
T:/****/Tools/Woke/windows/test.txt:2:0-6: `master` may be insensitive, use `primary`, `main`, `parent`, `leader`, `central`, `active` instead (error)
master
^
2022-10-18T13:15:17-07:00 DBG woke completed durationMS=13.8008

As you can see, it takes about a full 30 seconds each time woke runs to compile the ignores. The repository is quite large (about 400 GB) so this is an unusual use case. I'd also like to note that using woke version 0.17.1 does not have this issue.

Also could potentially be another issue, but I'm having troubles with .wokeignore not properly ignoring some paths in 0.19.0 which doesn't seem to occur in 0.17.1. It looks like directories specifically aren't properly ignored if you specify full paths, meaning a difference in running command line like this:

Not properly ignored and will find my test.txt with non-inclusive language:

T:\****>Tools\Woke\windows\woke.exe T:\****\Woke\windows

Properly ignored by .wokeignore:

T:\****>Tools\Woke\windows\woke.exe Tools\Woke\windows
github-actions[bot] commented 2 years ago

👋 Thanks for submitting your first issue!

Please be sure to read and follow our Code of Conduct and Contributing guide.

⭐️ Is your org or open source project using woke? If so, we'd love for you to be included in the 'Who uses woke' list at https://github.com/get-woke/woke/blob/main/docs/about.md#who-uses-woke.

pgtruong commented 2 years ago

I think the performance issues are caused by https://github.com/get-woke/woke/pull/117. An alternative possibly is to make this optional with a flag since we don't use nested ignores currently.

caitlinelfring commented 2 years ago

Lots of bugs reported because of this feature 😞. Yea I like the idea of opt-in (suggested it too https://github.com/get-woke/woke/pull/117#pullrequestreview-798398373). Since it's already a feature, opt-out might be better to avoid breaking existing usages until improvements can be made to deal with these issues

caitlinelfring commented 2 years ago

@armanrahman22 @KSLHacks @jeremydelacruz since you were all involved in the #117 I wonder if you would be interested at taking a stab at addressing these performance issues?

jeremydelacruz commented 1 year ago

@caitlinelfring sorry for the delay! Happy to come back to this and take a stab at it 🙂