golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.14k stars 17.37k forks source link

x/tools/imports: support sessions in Process and FixImports for performance #66802

Open philip-peterson opened 2 months ago

philip-peterson commented 2 months ago

Proposal Details

Certain toolchains such as Ent make extensive use of imports.Process(...) within a loop to format multiple files in a directory. However, because each invocation of Process itself executes a loop via parseOtherFiles on multiple files in the directory as well, the complexity may reach O(N^2) in this use case.

As a result, this proposal is to introduce support for a session object passed to Process:

func Process(s *Session, filename string, src []byte, opt *Options) (formatted []byte, err error) {
   ...
}

A session would a thin wrapper around an LRU cache, which can use the existing LRU functionality in gopls/internal/util/lru (which could easily be relocated to internal/lru):

type Session struct {
    Cache *lru.Cache
    Fset  *token.FileSet
}

Usage would look like:

s := NewSession(10) // 10 MB cache
return Process(s, file, contents, opts)

The LRU key would be something like (file path, file contents) and value would be the parsed AST. This would help prevent repeated calls to parse from incurring significant performance cost.

Session would be allowed to be nil to support the current use-case.

ianlancetaylor commented 2 months ago

x/tools/internal is not visible API, so this does not need to go through the proposal process.

ianlancetaylor commented 2 months ago

CC @golang/tools-team

philip-peterson commented 2 months ago

@ianlancetaylor My mistake, it would also affect the API for x/tools/imports (non-internal). x/tools/imports has a function called Process that invokes another function called Process in x/tools/internal/imports.

philip-peterson commented 2 months ago

Amendment: The cache, instead of being passed as first positional argument to Process, may be passed through the existing opt *Options argument as a field in Options. This will allow existing code to remain unchanged, since the value will default to nil.

adonovan commented 2 months ago

cc @findleyr, who has been working in imports recently. We have been discussing a possible imports API that allows the read/parse/infer/fix/format operations to be expressed as composable functions with no inherent dependency on external state. Currently the step I am calling "infer" runs the module resolver against GOMODCACHE to produce and rank candidates; many clients (including gopls) want to be able to hook this step into their own data structure, for more deterministic or customizable behavior. (GOMODCACHE is an accumulation of the user's previous habits.)

findleyr commented 2 months ago

@pjweinb is actually doing some work on goimports at the moment.

Indeed, this sounds like it may align with the needs of gopls.

I worked on goimports performance in #59216, and while I made some improvements for gopls, I also concluded that goimports scanning and resolution should really be re-thought from scratch. Inside gopls, we re-use an internal/imports.ProcessEnv, but I don't think the fix is to expose that type from x/tools/imports--we should instead come up with a new API that is less complex.

One of the requirements of this new API would be that gopls can itself provide package information as input into the goimports algorithm. Right now, package resolution is hidden behind a (quite complex) Resolver type, which has its own heuristics for scanning the module cache, and which may not agree with gopls about the best packages or modules to import. See also #36077.

philip-peterson commented 2 months ago

@adonovan @findleyr It's exciting that is happening. It sounds like a lot of scope though, and I wonder if it's a separate task from optimizing the current flow? (This ticket would simply be for memoizing the existing logic.)