edgedb / edgedb-js

The official TypeScript/JS client library and query builder for EdgeDB
https://edgedb.com
Apache License 2.0
511 stars 65 forks source link

Excessive build times and memory consumption transpiling edgeql-js #1046

Open rtlayzell opened 3 months ago

rtlayzell commented 3 months ago

I'm using the edgedb-js query builder alongside a nestjs project and ran into some out of memory issues when building or starting the app. I've tracked the issue down to transpiling the generated edgeql-js, during transpilation build times seem to be excessive and memory consumption reaches 4GB+.

image

I've created a sample project that minimally reproduces the issue.

I could probably pre-build the generated edgeql-js somehow but it's not ideal, and while the issue persists the query builder is pretty much a non-starter for any real projects.

Versions:

scotttrinh commented 3 months ago

@rtlayzell

TypeScript 5.3 introduced a pretty big performance regression in our types, specifically the e.op overloads we had (especially if you have some nested scopes and nested operators). We're actively working on mitigating this for our next @edgedb/generate release, but balancing powerful type inference/safety and type checking time/complexity will be an ongoing effort.

Stay tuned for the next release out in the next week or so, and I'll leave this issue open if you want to check back in after that release and we can explore any specific issues you still have.

scotttrinh commented 3 months ago

@rtlayzell

Just saw your minimal repro, and here's what I get (257M memory usage and 483k Instantiations):

$ npx tsc --extendedDiagnostics
Files:                         144
Lines of Library:            39995
Lines of Definitions:         2059
Lines of TypeScript:         21751
Lines of JavaScript:             0
Lines of JSON:                   0
Lines of Other:                  0
Identifiers:                103772
Symbols:                    132062
Types:                      173574
Instantiations:             483544
Memory used:               257066K
Assignability cache size:   131846
Identity cache size:           267
Subtype cache size:            383
Strict subtype cache size:    2458
I/O Read time:               0.02s
Parse time:                  0.20s
ResolveModule time:          0.01s
ResolveTypeReference time:   0.00s
ResolveLibrary time:         0.01s
Program time:                0.27s
Bind time:                   0.11s
Check time:                  1.92s
transformTime time:          0.03s
commentTime time:            0.03s
I/O Write time:              0.01s
printTime time:              0.20s
Emit time:                   0.20s
Total time:                  2.49s

I know your example is very small, but this isn't a really egregious example. I'll send an update with more. One of the main issues here is that you pay for all of the overhead of the complex inference engine even if you only use small parts of it. Granted, it gets a lot worse if you do use more complicated parts of it, but this is to be expected. Here's a similarally minimal case for drizzle-orm with better-sqlite3:

$ npx tsc --extendedDiagnostics
Files:                         335
Lines of Library:            40017
Lines of Definitions:        60446
Lines of TypeScript:            20
Lines of JavaScript:             0
Lines of JSON:                   0
Lines of Other:                  0
Identifiers:                100247
Symbols:                    250431
Types:                      155906
Instantiations:            1535626
Memory used:               375289K
Assignability cache size:    43601
Identity cache size:           216
Subtype cache size:              0
Strict subtype cache size:      24
I/O Read time:               0.05s
Parse time:                  0.21s
ResolveModule time:          0.04s
ResolveTypeReference time:   0.00s
ResolveLibrary time:         0.01s
Program time:                0.35s
Bind time:                   0.10s
Check time:                  1.90s
printTime time:              0.00s
Emit time:                   0.00s
Total time:                  2.35s

Now granted that example includes checking libraries, but that's a little closer to a one-to-one comparison with how our query builder codegen works. With skipLibCheck: true, memory usage is 114M, and Instantiations is 4945. You can get similar performance by treating the query builder as a library using project references in TypeScript when the query builder project is already built (80M and 3300 instantiations):

npx tsc -b src/tsconfig.json --extendedDiagnostics
Files:                        142
Lines of Library:           39995
Lines of Definitions:        7771
Lines of TypeScript:            6
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                85254
Symbols:                    40348
Types:                       1294
Instantiations:              3300
Memory used:               84792K
Assignability cache size:     217
Identity cache size:            0
Subtype cache size:             2
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.14s
ResolveModule time:         0.01s
ResolveTypeReference time:  0.00s
ResolveLibrary time:        0.01s
Program time:               0.20s
Bind time:                  0.07s
Check time:                 0.06s
transformTime time:         0.00s
commentTime time:           0.00s
printTime time:             0.01s
Emit time:                  0.01s
I/O Write time:             0.00s
Total time:                 0.35s
Projects in scope:                        2
Projects built:                           1
Aggregate Files:                        142
Aggregate Lines of Library:           39995
Aggregate Lines of Definitions:        7771
Aggregate Lines of TypeScript:            6
Aggregate Lines of JavaScript:            0
Aggregate Lines of JSON:                  0
Aggregate Lines of Other:                 0
Aggregate Identifiers:                85254
Aggregate Symbols:                    40348
Aggregate Types:                       1294
Aggregate Instantiations:              3300
Aggregate Memory used:               84792K
Aggregate Assignability cache size:     217
Aggregate Identity cache size:            0
Aggregate Subtype cache size:             2
Aggregate Strict subtype cache size:      0
Aggregate I/O Read time:              0.01s
Aggregate Parse time:                 0.14s
Aggregate ResolveModule time:         0.01s
Aggregate ResolveTypeReference time:  0.00s
Aggregate ResolveLibrary time:        0.01s
Aggregate Program time:               0.20s
Aggregate Bind time:                  0.07s
Aggregate Check time:                 0.06s
Aggregate transformTime time:         0.00s
Aggregate commentTime time:           0.00s
Aggregate printTime time:             0.01s
Aggregate Emit time:                  0.01s
Aggregate I/O Write time:             0.00s
Config file parsing time:             0.01s
Up-to-date check time:                0.01s
Build time:                           0.42s

I'm toying with the idea of having the query builder built as more of a library to capitalize on these kinds of gains, but haven't spent enough time exploring it just yet. I know Prisma does something similar where it generates into your node_modules which is a pretty clever hack 😅


At any rate, regardless of whatever kinds of project-level mitigations we can suggest for users, my plan is definitely to continue to measure and improve inference performance and DX.

rtlayzell commented 3 months ago

I'm surprised that you got those results running npx tsc --extendedDiagnostics I was able to get the same running that command. However, if you include .\index.ts (i.e. npx tsc .\index.ts --extendedDiagnostics) in the command the results are very different:

Files:                           88
Lines of Library:             33963
Lines of Definitions:          2059
Lines of TypeScript:          21751
Lines of JavaScript:              0
Lines of JSON:                    0
Lines of Other:                   0
Identifiers:                  98855
Symbols:                    6338767
Types:                      2008424
Instantiations:            21586909
Memory used:               3793121K
Assignability cache size:    826553
Identity cache size:            156
Subtype cache size:              83
Strict subtype cache size:     2394
I/O Read time:                0.02s
Parse time:                   0.32s
ResolveModule time:           0.03s
ResolveTypeReference time:    0.00s
ResolveLibrary time:          0.00s
Program time:                 0.39s
Bind time:                    0.16s
Check time:                  70.74s
transformTime time:           0.19s
commentTime time:             0.10s
I/O Write time:               0.06s
printTime time:               0.99s
Emit time:                    0.99s
Total time:                  72.29s
scotttrinh commented 3 months ago

@rtlayzell

Did you see the PR I made to show off the project references setup? The tsconfig.json for the src "project" includes the index.ts already.

scotttrinh commented 3 months ago

Oh, I see you already merged that! I think the main issue with your results are not running in "build mode" which is the way you're supposed to use project references (see https://www.typescriptlang.org/docs/handbook/project-references.html for more details). If you don't run it in build mode, you are correct, it's even worse!

BS-MauruschatM commented 3 weeks ago

Same issue for me with NestJS. I can't run my project anymore in dev or build mode, bacause the compilation always results in "Reached heap limit Allocation failed - JavaScript heap out of memory"

The compilation of edgeql-js wants 4GB+ Also my remote connection with edgedb want 2,8GB+

Very frustrating.

BS-MauruschatM commented 3 weeks ago

@rtlayzell Do you have any solutions for that? Maybe also a problem with NestJS?

scotttrinh commented 3 weeks ago

@BS-MauruschatM

Have you tried setting up Project References for the query builder yet?