bradfitz / go-issue-mirror

[old] precursor to golang.org/x/build/maintner/godata
24 stars 2 forks source link

Hint go tool not to look inside issues folder for Go packages. #13

Closed dmitshur closed 8 years ago

dmitshur commented 8 years ago

(I'm reporting this issue, but no PR because it's not feasible to make a PR to resolve this. Only you, @bradfitz, the owner of this repo, can resolve this. It's easy.)

Problem

Having this repository inside my GOPATH workspace makes operations such as go list all or go list .../foo/bar much, much slower, needlessly.

I have a moderate/large amount of Go packages in my GOPATH (around 2000), and I have an SSD. With go-issue-mirror repository as is, doing go list all takes 8-10 seconds:

$ time go list all
real    0m9.400s
$ time go list all
real    0m7.993s
$ time go list all
real    0m10.523s

With a simple change I propose below in "Solution" section, that time can be restored to a much more reasonable 2.5~ seconds.

go list all is just an example. It also affects queries such as .../foo/bar, which I use very often to navigate to specific Go packages.

Cause

The cmd/go command scans the GOPATH workspace(s) each time to find Go packages. It goes over all folders, and checks if they have .go files (i.e. it checks if build.ImportDir successfully finds a package).

The issues directory in this repository contains a single Go package, but it also has 1000 directories, each with many more directories and files.

We as humans know it's a waste of time looking for Go packages inside, but the cmd/go tool doesn't. It needs a hint.

Solution

Luckily, it supports such hints:

Directory and file names that begin with "." or "_" are ignored by the go tool, as are directories named "testdata".

Using "." is not a good idea because it'll make the folder hidden. "testdata" isn't a great fit since this is just real "data", not "data for tests". But you can prepend the folder with an underscore, and cmd/go will know to skip it and not waste time scanning all those directories/files.

I usually name such folders _data (it's like testdata, except not for tests, and starts with _). E.g.:

So, if I move all of issues subfolders into a directory named _data, then it becomes 2.5 seconds:

$ time go list all
real    0m2.646s
$ time go list all
real    0m2.736s
$ time go list all
real    0m2.591s

I know you use github.com/bradfitz/go-issue-mirror/issues import path to find the issues "data". You can move all that issues data into _data subfolder:

.
├── README.md
├── cmd
│   └── servegoissues
│       └── servegoissues.go
└── issues
    ├── issues.go
    └── _data
        ├── 000
        │   ├── 1000.comments
        │   │   ├── comment-66052206.json
        │   │   ├── comment-66052207.json
        │   │   ├── comment-66052208.json
        │   │   ├── comment-66052209.json
        │   │   ├── comment-66052210.json
        │   │   └── comment-66052211.json
        │   ├── 1000.json
        │   ├── 10000.comments
        │   │   ├── comment-76099228.json
        │   │   ├── comment-76835959.json
        │   │   ├── comment-82634364.json
        │   │   └── comment-82636153.json
        │   ├── 10000.json
        │   ├── 11000.comments
        │   │   ├── comment-107078804.json

And keep issues.go inside github.com/bradfitz/go-issue-mirror/issues as is, just modify issues.go#L18 to point to the new subdirectory.

Anyway, how exactly you resolve this is up to you, but please fix it... I am sad when Go is unnecessarily slow. :(

bradfitz commented 8 years ago

SGTM.

This is especially bad with on OS X with its shitty VFS.

I get:

bradfitz@laptop ~$ time go list all
....
real    1m19.141s
user    0m6.622s
sys 0m6.329s

And on the second run:

real    0m20.448s
user    0m5.820s
sys 0m4.794s

(Where Linux is 5 seconds on the second run.)

But both 5 secondsand 22 seconds are both slow.

bradfitz commented 8 years ago

Done. Thanks.