Motivation

Object traversal can be quite complex, especially in dynamic languages.

Walking can be modal. Objects could be composed of discontiguous regions data. Those regions may be inconsistently encoded. For example, an object might have header slots, then normal/inline slots, and then side-allocated overflow slots. This does not map well to our existing iterators, which assume a heterogenous and contiguous layout.
Object layouts change dynamically. Keeping up to date slot maps is infeasible.
Reference slots are often tagged, sometimes in nontrivial ways. The encodings can even be mixed across slots.

These are all real issues I am running up against while implementing javascript.

Slot Encoding

To solve the issue with slot encoding, I want to allow applications to implement their own SlotHandle type. SlotHandle is a handle to a data slot, which holds a reference. SlotHandle provides a small API:

readReference: read reference from slot
writeReference: write reference to slot
atomicWriteReference
possibly later: clearReference

Implementing this class in the glue gives languages the power to implement their own slot encoding. The SlotHandle glue replaces the GC_SlotObject type in OMR. An application's SlotHandle types could even implement complex behaviour, such as updating a hashtable.

OMR can provide a BasicSlotHandle, which applications would be free to use when implementing their own object model. Here is how it could be implemented:

namespace OMR {
namespace GC {

/// Utility class for providing the GC with read/write to slot capabilities.
/// Looks a lot like GC_SlotObject.
class BasicSlotHandle {

  explicit BasicSlotHandle(omrobjectptr_t* slot) : slot(slot) {}

  omrobjectptr_t readReference() const { return *slot; }

  void writeReference(omrobjectptr_t reference) const { *slot = reference; }

private:
    omrobjectptr_t* slot;
};

}  // namespace GC
}  // namespace OMR

Issue: Compressed references

One of the big contentions with this design is, this makes compressed references a concern of the application's object model, and is no longer provided "for free" by the GC. Unfortunately, I believe that crefs can't work without support from the object model, and the added cost of implementing crefs is necessary for the sake of flexibility.

To make implementing basic crefs easy on application writers, I want to provide a CompressedSlotHandle in OMR. Applications like Java are free to use this type when implementing their object model on 64bit systems.

What I don't know is, how can we make it easy for applications to configure the heap for compressed references. Currently, I am imagining some kind of helper for building a heap configuration and choosing a shift value, based on a user-specified max address. Ultimately, I would like it to be feasible for applications to implement an ObjectModel that combines both crefs and tagging.

Walking Objects

To deal with complex layouts, I want to apply the visitor pattern to slot traversal--let the app drive visitation. The app would provide a traverse API which calls visitor.visitEdge on each slot in an object, providing the visitor with a SlotHandle.

The Visitor is able to use the application's SlotHandleT to scavenge, mark slots, or fix up during compaction. The GC implements a suite of visitors for each GC operation.

ObjectSlotWalker::traverse is templated on the type of visitor, which gives us zero overhead dispatch. The app should be able to write one VisitObject, and immediately have support for every collector.

Here is an example implementation:

namespace OMRApp {

/// A really simple object model. Storage for slots is allocated off the end. 
struct Object {
  std::size_t size;
  Object* slots[0];
};

// Visit each edge of the object. New glue Code.
class ObjectSlotWalker {
  // Call visitor.edge() on each slot in object
  template <typename VisitorT>
  void traverse(Object* object, VisitorT visitor) {
    for (std::size_t i = 0; i < object->size; i++) {
      visitor.visitEdge(object, OMR::GC::BasicSlotHandle(&object->slots[i]));
    }
  }
};

} // namespace OMRApp

GC Visitor: MarkingVisitor

Visitors are templated on the slot handle type. This allows the application to mix and match slot handles with zero runtime overhead, and helps reduce circular dependencies between OMR and the glue. The edge callback are typically very small. Templating on the SlotHandle type could allow applications to use the same visitors to implement root marking and object scanning. This is the MarkingVisitor:

namespace OMR {
namespace  GC {

struct MarkingVisitor {
    MarkingVisitor(MM_EnvironmentBase* env, MM_MarkingScheme* markingScheme) :
        env(env), markingScheme(markingScheme) {}

    MM_EnvironmentBase* env;
    MM_MarkingScheme* markingScheme;

    template <class SlotHandleT>
    void visitEdge(omrobjectptr_t obj, SlotHandleT slot) {
        markingScheme->inlineMarkObjectNoCheck(env, slot.readReference());
    }
};

void MM_MarkingScheme::scanObject(MM_EnvironmentBase* env, omrobjectptr_t object) {
  _objectSlotWalker.traverse(object, MarkingSlotVisitor(env, this));
};

}  // namespace GC
}  // namespace OMR

The API leaves room for introducing new types of edges later. For example, we could add callbacks for nullableEdges that require null checks, or weakEdges that can be cleared by the GC during an aggressive collection.

Rooting and Visitors

By templating the visitors on the slot handle type, this enables applications to use the visitors to easily implement root scanning.

Issue: How to support split scanning and work units

The APIs described above take a per-object approach to doing GC work. To better support incremental scanning of objects at the language layer, this new glue code could instead be built around scanning work units. Instead of VisitObject, we add a VisitSlots(WorkUnit work) functor. We add WorkUnit and WorkUnitProducer to the glue, which allows the language to transform unscanned objects into chunks of work.

WorkUnit: A span of an object for scanning. Basic object models scan entire objects at a time--work unit is just an object pointer.
VisitSlotsInWorkUnit: instead of visitObject. Can push new work units on to work stack?
WorkUnitProducer: Takes unscanned objects and transforms them into work units.

After talking with @youngar I've removed the VisitObject functor, and replaced it with a simpler ObjectSlotWalker::traverse.

@rwy0717 I'd like to understand with greater depth the issues that you are encountering while fitting Javascript onto OMR. From the points you raise in Motivation, above, I still don;t see a need to extend or replace the existing object scanning model, at least insofar as they are used in gc.

Walking can be modal. Objects could be composed of discontiguous regions data. Those regions may be inconsistently encoded...

Discontiguous and dynamic object models are fully supported by the existing object scanning framework, which assumes only that language developers can enumerate a series of (map-base-address, map-slot-bits) pairs that cover the entire object (and all of its dependent parts, if discontiguous). Each map-base-address is fomrobject_t aligned and points the the first fomrobject_t slot in an array of 64 slots. Each bit in the map-slot-bits bitmap indicates that the corresponding fomrobject_t slot relative to the map-base-address is a reference slot (bit is set) or not (bit not set).

Object layouts change dynamically. Keeping up to date slot maps is infeasible.

Can you give me an example of where object scanning is required outside of stop-the-world contexts (eg, gc)? If not, then object dynamism should not be an issue because object shape cannot mutate within stop-the-world contexts. Current object scanners do not assume that the object slot maps are stable and can easily accommodate shape changes since the slot maps must be recomputed for each object scanner instance. This is a no-brainer for OpenJ9 Java because object shapes are static and the slot maps are fixed and stored in corresponding object class. But computing these bit maps for dynamic languages eg Javascript should not be overly complex or incur an inordinate amount of overhead. For example, if Javascript encodes object slots by setting the low-order bit in the slot, the object scanner implementation need only walk its slots to determine the location of reference slots and construct slot maps accordingly. And it must presumably walk all of these slots in any case, for the purpose of identifying reference slots. The additional overhead incurred by subsequent scanning is relatively light, since only the reference slots found and recorded in the slot maps are visited.

I would like to help you to get your scanning issues ironed out. Have a look at the above and let's discuss any concerns that remain.

Something to bear in mind is that polymorphic object scanner types will have a strong influence on generational gc and marking performance. Existing object scanners present a simple and consistent interface to gc componentry, with the result that gc components are not involved with issues like object shape changes and idiomatic (language-specific) concerns regarding object representation. I would rather push any complexity involved in enumeratng object reference slots into the consuming language. Doing otherwise will have an impact on all languages that run on OMR.

Can you give me an example of where object scanning is required outside of stop-the-world contexts (eg, gc)

What about during concurrent marking?

let's discuss any concerns that remain.

OK, there's still the (big big big) issue with app-defined slot encoding. IMHO, this is one thing we did really wrong. The GC requires applications to encode GC references as either bare pointers or bare compressed pointers (and aligned to boot!). This means that it's not even possible to tag pointers. Realistically, applications need to be able to define their own data encoding. This means that the GC must read and write slots through a glue interface.

It's worthwhile to consider crazier object layouts too. What about relocatable pointers (SRP's) on the heap, or bitmap-based 1:n pointers?

I would rater push any complexity involved in enumeration object reference slots into the consuming language. Doing otherwise will have an impact on all languages that run on OMR.

This is what this change is doing. A language can perform their own idiomatic object walking, without working around the GC_ObjectIterator interface.

Think about the example code. Here is everything the GC needs to walk an object:

class ObjectSlotWalker {
template <typename VisitorT>
traverse(Object* object, VisitorT visitor) {
  for (Slot* slot = object->begin(); slot != object->end(); ++slot) {
    visitor.edge(object, OMR::GC::BasicSlotHandle(slot));
  }
};

Pretty simple!

Even Java would benefit, ObjectSlotWalker::traverse would be a straightforward iteration of the slot map. This will allow us to remove the ObjectScannerState, and remove the virtual getNextSlotMap function call.

These new GC visitors could also be reused by the languages to implement root marking or scavenging in simpler terms.

And OK, yes, a JS engine can build slot maps for the object scanner. To build a slot map, you must lock the object, and walk the slots. I do not see a clear way to implement locking semantics from inside the iterator. And while the overhead is presumably light, it's totally unnecessary. It seems needlessly complicated to scan an object twice--why not just mark slots immediately?

I think the bottom line is, the ObjectScanner interface is too low level for the glue, and results in a rigid yet complicated interface. These new APIs sit at a higher level, and are going to be a net win for the flexibility, understandability, and performance of the GC and applications.

Ok. Thanks for reminding me about concurrentGC. But even in that context, it is reasonable to expect that the visitor can determine the shape and enumerate reference slots exhaustively at time of marking. It has complete control over the sequencing and alignment of map-base-address (it can even present a series of same with just the 1 bit set in each associated map-slot-bits mask). Duplicates (reiteration of previously scanned slots) are permitted but not advised. The gc assumes only that the reference is represented as a 32-bit (compressedrefs or 32-biit platform) or 64-bit (uncompressed on 64-bit platform) unsigned integer. Modifications to language-specific encodings of object references (these are possible) can be handled in extensions of GC_SlotObject.

It seems to me that your main complaints relate to object alignment, but the object scanning model I think can cope with the issues that you raise. Bear in mind that changes in elemental factors like object reference format will have far-reaching impacts on the code base as vm, jit, gc, even port library embed them in their source code.

That said, It think that your Visitor/ObjectSlotWalker model is very elegant. Consider using it in a Javascript-specific delegate (scavenger still in CLI at present) to implement object scanners as required to support marking and copy forward schemes.

I discussed this with @amicic and he suggested that refactoring GC_SlotObject to accommodate language-specific extension and implementation would be a starting point for resolving reference slot encoding concerns. He pointed out that fomrobject_t form factor for object reference containment is assumed throughout other OMR components.

I think that GC_SlotObject parametization (via template) would help a lot, eg GC_SlotObject<ReferenceSlotT> where ReferenceSlotT denotes a class with language-specific method implementations for reading/writing reference pointer values to fit into OMRs 32/64-bit fomrobject_t slots. This would allow language-specific encodings/tagging of reference slots, while providing efficient inline methods for encoding/decoding contents of object reference slots from/to OMR heap pointers.

If reference fields are not fomrobject_t aligned, they can be mapped by (map-base-address, map-slot-bits) pairs that have map-base-address aligned as required to overlay the mapped segment with fomrobject_t[64] and read 32/64-bit reference slot values from the slot mapped by the bitmap.

The GC_ObjectScanner model assumes only the an object is a collection of 1 or more disjoint object-aligned address ranges residing in the runtime heap. Each range is overlaid with fomrobject_t[N] for least N that covers the range, in the default view. Ranges with N>64 must be partitioned into 1+ N/64 (map-base-address, map-slot-bits), as required to cover the default view. So languages implementing object scanners need to know how to locate each part of any object, and the rest is straightforward.

If any of the subranges within any part of an object contain reference slots that are not fomrobject_t aligned, the subrange can be presented using >1 (map-base-address, map-slot-bits) as described in previous comment, the bitmaps selecting disjoint collections of reference slots.

And with all of the above said, GC_ObjectScanner<ObjectSlotWalkerT>, where ObjectWalkerT just provides inline getNextSlot(), could eliminate all of the map/bitmap clutter entirely. Along with GC_SlotObject<ReferenceSlotT> ...

There have historically been a lot of issues with using templates in the OMR codebase. I am not opposed to investigating them as a solution to the object iteration problem at hand.

@rwy0717 has a very elegant proposal here that I think would work for many runtimes. The current solution is feasible for runtimes other than Java but definitely not ideal. In SOM++ it already uses a visitor pattern to walk roots and object slots so the current bitmap solution is very difficult to use.

I think there may be some performance issues with the proposed solution here but I would like to see some examples of it's use in the current OMR GC example and maybe with some of the OMR downstream projects like OpenJ9.

The idea of the OMR GC being consumed by a new runtime in a single day is something that I will always strive for. In the initial days of OMR I was able to use the GC in SOM++ with 2 days of effort which included learning how its makefiles/build system worked, linking OMR into the runtime and providing all of the required OMR GC glue. In its current state I got side tracked after 4 days of not having a working solution so I have not been able to move up to the latest code. I would love to be able to use the OMR scavenger technology in SOM++ so resolving these difficulties to using OMR will be a huge improvement.

eclipse-omr / omr

GC: Rework object slot visitation in OMR #2603