Open rwy7 opened 6 years ago
After talking with @youngar I've removed the VisitObject functor, and replaced it with a simpler ObjectSlotWalker::traverse
.
@rwy0717 I'd like to understand with greater depth the issues that you are encountering while fitting Javascript onto OMR. From the points you raise in Motivation, above, I still don;t see a need to extend or replace the existing object scanning model, at least insofar as they are used in gc.
Walking can be modal. Objects could be composed of discontiguous regions data. Those regions may be inconsistently encoded...
Discontiguous and dynamic object models are fully supported by the existing object scanning framework, which assumes only that language developers can enumerate a series of (map-base-address, map-slot-bits) pairs that cover the entire object (and all of its dependent parts, if discontiguous). Each map-base-address is fomrobject_t aligned and points the the first fomrobject_t slot in an array of 64 slots. Each bit in the map-slot-bits bitmap indicates that the corresponding fomrobject_t slot relative to the map-base-address is a reference slot (bit is set) or not (bit not set).
Object layouts change dynamically. Keeping up to date slot maps is infeasible.
Can you give me an example of where object scanning is required outside of stop-the-world contexts (eg, gc)? If not, then object dynamism should not be an issue because object shape cannot mutate within stop-the-world contexts. Current object scanners do not assume that the object slot maps are stable and can easily accommodate shape changes since the slot maps must be recomputed for each object scanner instance. This is a no-brainer for OpenJ9 Java because object shapes are static and the slot maps are fixed and stored in corresponding object class. But computing these bit maps for dynamic languages eg Javascript should not be overly complex or incur an inordinate amount of overhead. For example, if Javascript encodes object slots by setting the low-order bit in the slot, the object scanner implementation need only walk its slots to determine the location of reference slots and construct slot maps accordingly. And it must presumably walk all of these slots in any case, for the purpose of identifying reference slots. The additional overhead incurred by subsequent scanning is relatively light, since only the reference slots found and recorded in the slot maps are visited.
I would like to help you to get your scanning issues ironed out. Have a look at the above and let's discuss any concerns that remain.
Something to bear in mind is that polymorphic object scanner types will have a strong influence on generational gc and marking performance. Existing object scanners present a simple and consistent interface to gc componentry, with the result that gc components are not involved with issues like object shape changes and idiomatic (language-specific) concerns regarding object representation. I would rather push any complexity involved in enumeratng object reference slots into the consuming language. Doing otherwise will have an impact on all languages that run on OMR.
Can you give me an example of where object scanning is required outside of stop-the-world contexts (eg, gc)
What about during concurrent marking?
let's discuss any concerns that remain.
OK, there's still the (big big big) issue with app-defined slot encoding. IMHO, this is one thing we did really wrong. The GC requires applications to encode GC references as either bare pointers or bare compressed pointers (and aligned to boot!). This means that it's not even possible to tag pointers. Realistically, applications need to be able to define their own data encoding. This means that the GC must read and write slots through a glue interface.
It's worthwhile to consider crazier object layouts too. What about relocatable pointers (SRP's) on the heap, or bitmap-based 1:n pointers?
I would rater push any complexity involved in enumeration object reference slots into the consuming language. Doing otherwise will have an impact on all languages that run on OMR.
This is what this change is doing. A language can perform their own idiomatic object walking, without working around the GC_ObjectIterator
interface.
Think about the example code. Here is everything the GC needs to walk an object:
class ObjectSlotWalker {
template <typename VisitorT>
traverse(Object* object, VisitorT visitor) {
for (Slot* slot = object->begin(); slot != object->end(); ++slot) {
visitor.edge(object, OMR::GC::BasicSlotHandle(slot));
}
};
Pretty simple!
Even Java would benefit, ObjectSlotWalker::traverse
would be a straightforward iteration of the slot map. This will allow us to remove the ObjectScannerState, and remove the virtual getNextSlotMap function call.
These new GC visitors could also be reused by the languages to implement root marking or scavenging in simpler terms.
And OK, yes, a JS engine can build slot maps for the object scanner. To build a slot map, you must lock the object, and walk the slots. I do not see a clear way to implement locking semantics from inside the iterator. And while the overhead is presumably light, it's totally unnecessary. It seems needlessly complicated to scan an object twice--why not just mark slots immediately?
I think the bottom line is, the ObjectScanner interface is too low level for the glue, and results in a rigid yet complicated interface. These new APIs sit at a higher level, and are going to be a net win for the flexibility, understandability, and performance of the GC and applications.
Ok. Thanks for reminding me about concurrentGC. But even in that context, it is reasonable to expect that the visitor can determine the shape and enumerate reference slots exhaustively at time of marking. It has complete control over the sequencing and alignment of map-base-address (it can even present a series of same with just the 1 bit set in each associated map-slot-bits mask). Duplicates (reiteration of previously scanned slots) are permitted but not advised. The gc assumes only that the reference is represented as a 32-bit (compressedrefs or 32-biit platform) or 64-bit (uncompressed on 64-bit platform) unsigned integer. Modifications to language-specific encodings of object references (these are possible) can be handled in extensions of GC_SlotObject.
It seems to me that your main complaints relate to object alignment, but the object scanning model I think can cope with the issues that you raise. Bear in mind that changes in elemental factors like object reference format will have far-reaching impacts on the code base as vm, jit, gc, even port library embed them in their source code.
That said, It think that your Visitor/ObjectSlotWalker model is very elegant. Consider using it in a Javascript-specific delegate (scavenger still in CLI at present) to implement object scanners as required to support marking and copy forward schemes.
I discussed this with @amicic and he suggested that refactoring GC_SlotObject
to accommodate language-specific extension and implementation would be a starting point for resolving reference slot encoding concerns. He pointed out that fomrobject_t
form factor for object reference containment is assumed throughout other OMR components.
I think that GC_SlotObject parametization (via template) would help a lot, eg GC_SlotObject<ReferenceSlotT>
where ReferenceSlotT
denotes a class with language-specific method implementations for reading/writing reference pointer values to fit into OMRs 32/64-bit fomrobject_t
slots. This would allow language-specific encodings/tagging of reference slots, while providing efficient inline methods for encoding/decoding contents of object reference slots from/to OMR heap pointers.
If reference fields are not fomrobject_t
aligned, they can be mapped by (map-base-address, map-slot-bits) pairs that have map-base-address aligned as required to overlay the mapped segment with fomrobject_t[64]
and read 32/64-bit reference slot values from the slot mapped by the bitmap.
The GC_ObjectScanner model assumes only the an object is a collection of 1 or more disjoint object-aligned address ranges residing in the runtime heap. Each range is overlaid with fomrobject_t[N]
for least N that covers the range, in the default view. Ranges with N>64 must be partitioned into 1+ N/64 (map-base-address, map-slot-bits), as required to cover the default view. So languages implementing object scanners need to know how to locate each part of any object, and the rest is straightforward.
If any of the subranges within any part of an object contain reference slots that are not fomrobject_t aligned, the subrange can be presented using >1 (map-base-address, map-slot-bits) as described in previous comment, the bitmaps selecting disjoint collections of reference slots.
And with all of the above said, GC_ObjectScanner<ObjectSlotWalkerT>
, where ObjectWalkerT
just provides inline getNextSlot()
, could eliminate all of the map/bitmap clutter entirely. Along with GC_SlotObject<ReferenceSlotT>
...
There have historically been a lot of issues with using templates in the OMR codebase. I am not opposed to investigating them as a solution to the object iteration problem at hand.
@rwy0717 has a very elegant proposal here that I think would work for many runtimes. The current solution is feasible for runtimes other than Java but definitely not ideal. In SOM++ it already uses a visitor pattern to walk roots and object slots so the current bitmap solution is very difficult to use.
I think there may be some performance issues with the proposed solution here but I would like to see some examples of it's use in the current OMR GC example and maybe with some of the OMR downstream projects like OpenJ9.
The idea of the OMR GC being consumed by a new runtime in a single day is something that I will always strive for. In the initial days of OMR I was able to use the GC in SOM++ with 2 days of effort which included learning how its makefiles/build system worked, linking OMR into the runtime and providing all of the required OMR GC glue. In its current state I got side tracked after 4 days of not having a working solution so I have not been able to move up to the latest code. I would love to be able to use the OMR scavenger technology in SOM++ so resolving these difficulties to using OMR will be a huge improvement.
Motivation
Object traversal can be quite complex, especially in dynamic languages.
These are all real issues I am running up against while implementing javascript.
Slot Encoding
To solve the issue with slot encoding, I want to allow applications to implement their own
SlotHandle
type.SlotHandle
is a handle to a data slot, which holds a reference.SlotHandle
provides a small API:readReference
: read reference from slotwriteReference
: write reference to slotatomicWriteReference
clearReference
Implementing this class in the glue gives languages the power to implement their own slot encoding. The
SlotHandle
glue replaces theGC_SlotObject
type in OMR. An application's SlotHandle types could even implement complex behaviour, such as updating a hashtable.OMR can provide a
BasicSlotHandle
, which applications would be free to use when implementing their own object model. Here is how it could be implemented:Issue: Compressed references
One of the big contentions with this design is, this makes compressed references a concern of the application's object model, and is no longer provided "for free" by the GC. Unfortunately, I believe that crefs can't work without support from the object model, and the added cost of implementing crefs is necessary for the sake of flexibility.
To make implementing basic crefs easy on application writers, I want to provide a
CompressedSlotHandle
in OMR. Applications like Java are free to use this type when implementing their object model on 64bit systems.What I don't know is, how can we make it easy for applications to configure the heap for compressed references. Currently, I am imagining some kind of helper for building a heap configuration and choosing a shift value, based on a user-specified max address. Ultimately, I would like it to be feasible for applications to implement an ObjectModel that combines both crefs and tagging.
Walking Objects
To deal with complex layouts, I want to apply the visitor pattern to slot traversal--let the app drive visitation. The app would provide a
traverse
API which callsvisitor.visitEdge
on each slot in an object, providing the visitor with aSlotHandle
.The Visitor is able to use the application's
SlotHandleT
to scavenge, mark slots, or fix up during compaction. The GC implements a suite of visitors for each GC operation.ObjectSlotWalker::traverse
is templated on the type of visitor, which gives us zero overhead dispatch. The app should be able to write oneVisitObject
, and immediately have support for every collector.Here is an example implementation:
GC Visitor: MarkingVisitor
Visitors are templated on the slot handle type. This allows the application to mix and match slot handles with zero runtime overhead, and helps reduce circular dependencies between OMR and the glue. The edge callback are typically very small. Templating on the SlotHandle type could allow applications to use the same visitors to implement root marking and object scanning. This is the MarkingVisitor:
The API leaves room for introducing new types of edges later. For example, we could add callbacks for nullableEdges that require null checks, or weakEdges that can be cleared by the GC during an aggressive collection.
Rooting and Visitors
By templating the visitors on the slot handle type, this enables applications to use the visitors to easily implement root scanning.
Issue: How to support split scanning and work units
The APIs described above take a per-object approach to doing GC work. To better support incremental scanning of objects at the language layer, this new glue code could instead be built around scanning work units. Instead of
VisitObject
, we add aVisitSlots(WorkUnit work)
functor. We addWorkUnit
andWorkUnitProducer
to the glue, which allows the language to transform unscanned objects into chunks of work.WorkUnit
: A span of an object for scanning. Basic object models scan entire objects at a time--work unit is just an object pointer.VisitSlotsInWorkUnit
: instead of visitObject. Can push new work units on to work stack?WorkUnitProducer
: Takes unscanned objects and transforms them into work units.