Open ironcev opened 2 months ago
- A GC-friendly
TypeEngine
should never garbage-collect shareable types. The number of those types is orders of magnitude smaller then the overall number of types inside of theTypeEngine
, and they can be heavily reused across all the projects and modules. Shareable types should "live forever" and never be GCed, and thus, never be assigned to any particularsource_id
. Examples of shareable types are all built-in types like, e.g.,u64
, but also types like, e.g.,Option<u64>
orMyStruct
orMyGenericStruct<bool>
.
The idea of shareable types is interesting and not something I really considered before outside the builtin core types.
Could you maybe expand on this concept of a shareable type? Would they be std
or just core
types maybe?
And would the main difference be they don't contain a source id? I am wondering if that could impact some diagnostic messages which might be pointing to some core types.
Also I get a bit unsure when you say that they should never be GC'd. First lets say we consider all std
types to be shareable and thus not considered for GC. But what would happen if lets say, we change something in the std, and recompile a project that has been cached (and thus referencing the old std
cached type ids). In that case, we would need to GC the old std types out of the type engine right?
3. A GC-friendly
TypeEngine
should, ideally, offer anO(1)
removal of files during the GC.
A potential idea me and @JoshuaBatty previously discussed before would be the possibility of having separate per-module engines, an id would have a reserved space to contain the engine id. This would allow a O(1) removal of the engine data when the module is collected.
Thanks for the early feedback and excellent questions @tritao! This issue description indeed requires more elaboration including the proper definitions and more concrete examples of measurements that justify proposed design. This first issue description is primarily created so that I can link to it in several TODOs left in the code that call for future changes in the TypeEngine
. I'll elaborate more on all points including examples as well as explaining concepts in detail. After your question, I see that the description in the first point is pretty misleading. Also, it is important to add how it all works in sync with the DeclEngine
during GC. And for the third point, how a solution could look like without having separate engines. Thanks a lot for the feedback and questions and stay tuned :smile:
NOTE: This is the draft version of the proposal. It lists the three major points that a "garbage-collection-friendly"
TypeEngine
should support, but still not in enough detail. The concrete architectural proposal is also missing, as well as the results of measurements from a real-world project that support the reasoning behind the proposal.The numbers given in the below description are coming from the compilation of the Spark Orderbook workspace, a realistic real-world project.
Shareable
TypeInfo
s and their lifetimeWe use the term shareable type as defined in #6613.
Examples of shareable types are all built-in types like, e.g.,
u64
, but also types like, e.g.,Option<u64>
orMyStruct
orMyGenericStruct<bool>
.The number of those types is orders of magnitude smaller then the overall number of types inside of the
TypeEngine
, and they can be heavily reused across all the projects and modules, which is currently not the case. (TODO: Add data from measurements.)Currently, the
TypeEngine
optimized in #6613 reuses theTypeSourceInfo
s of shareableTypeInfo
s and not theTypeInfo
s themselves. This is sub-optimal because shareable types that cannot be changed in between garbage collections are still assigned asource_id
and are being garbage collected. This table shows the content of theshareable_types
hash map. It is evident that types like!
,()
,bool
, etc are stored unnecessarily many times.Those types should "live forever" and never be garbage collected.
Also, the shareable types like
Option<u64>
should be shareable across modules, means for differentsource_id
s we should still point to a single sharedOption<u64>
instance. Note that this instance should be GCed if the definition ofOption
gets changed.Distribution of
TypeInfo
s acrosssource_id
sTypeInfo
s that should get garbage collected should be distributed towards the leafs of project and module dependency tree. This means, the types should have assigned, if possible of course,source_id
s of the files that are likely to be changed, and those are always the files in the projects that developers actually work on (the leafs), and not the files in the dependencies, especially the standard library dependencies.E.g., currently, if we have
fn my_function<A>() -> Option<A>
in the code we are editing (a leaf!), the non-shareableOption<A>
type will get assigned thesource_id
of the originalOption
declaration, the one from the standard library. Which means it will never be GCed, although it should be GCed whenever we GC the current module the programmer is changing.E.g., currently, all of the
TypeInfo::Unknown
types do not havesource_id
assigned, and not all of them getreplace()
ed in the engine. Out of ~41.000 insertedUnknown
s, some ~4.500 remain unreplaced, and, not havingsource_id
assigned, can never be GCed. This produces a constant small "memory leak" within theTypeEngine
.Time-performant garbage collection
TypeEngine
should, ideally, offer anO(1)
removal of files during the GC.E.g., before the optimization done in #6613, the
ConcurrentSlab
use to hold ~510.000 elements. All those elements needed to be traversed during GC, to remove just a handful of types. After the optimization, theslab
contains ~230.000 elements which speeds up the traversal, but sill has anO(n)
complexity.Architectural proposal
TODO: Write proposal.