eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
945 stars 396 forks source link

Proposal: fomrobjectptr_t #5027

Open gacholio opened 4 years ago

gacholio commented 4 years ago

Background

fomrobject_t represents the contents of an object-to-object reference slot. In the current single-mode builds, it is typedeffed to either uint32_t (compressed pointers) or uintptr_t (full pointers).

Prior to the runtime compressed refs work, it was legal to have pointers to these slots, perform arithmetic on those pointers, and load from or store to the pointer. It is now no longer legal to perform these operations directly - GC_SlotObject provides an interface to abstract these operations.

Proposal

Replace fomrobject_t* with fomrobjectptr_t, which will (eventually) be a pointer to an opaque type, which will disallow pointer arithmetic and dereferencing without an explicit cast to the correctly-sized type.

This will prevent any new errors being introduced that will only show up in mixed mode.

Step 1

typedef fomrobject_t *fomrobjectptr_t;

At this point, the types are equivalent, so downstream projects will compile with or without making any changes.

Step 2

Replace all instances of fomrobject_t * with fomrobjectptr_t in OMR and downstream projects.

Step 3

struct omr_opqaue_fomrobject_t;
typedef omr_opqaue_fomrobject_t *fomrobjectptr_t;

At this point, any downstream project that has not made the transition to fomrobjectptr_t will no longer compile.

gacholio commented 4 years ago

@dmitripivkine @amicic @mstoodle @youngar @rwy0717

youngar commented 4 years ago

I think this change is important to avoid accidental pointer arithmetic, a mistake I have made in the past.

I personally am not a big fan of defining pointer types like fomrobjectptr_t, when you can just use fomrobject_t *. I'm not hung up on this, and we already define types like omrobjectptr_t.

Is there anything stopping us to jumping right to step 3? If we fix everything in OMR, OpenJ9 can do:

-#if defined(OMR_GC_FULL_POINTERS)
-typedef UDATA fomrobject_t;
-typedef UDATA fomrarray_t;
-#else /* OMR_GC_FULL_POINTERS */
-typedef U_32 fomrobject_t;
-typedef U_32 fomrarray_t;
-#endif /* OMR_GC_FULL_POINTERS */
+struct fomrobject_t;
+struct fomrarray_t;
gacholio commented 4 years ago

I believe there are still places that declare and use fomrobject_t directly, so the type has to remain defined as an integer type - it certainly can't be opaque.

youngar commented 4 years ago

Yeah, I'm seeing that now :(. I'm going to take a quick look to see how hard (or impossible) it is to fix these places in OMR and OpenJ9.

gacholio commented 4 years ago

It would mean replacing the non-pointer fomrobject_t with uintptr_t I think (which would be fine with me).

youngar commented 4 years ago

That is my conclusion as well, it seems like a clean solution. I think I like this better.

rwy7 commented 4 years ago

This makes sense to me, and seems like a good change.

Personally, I've always thought the name fomrobject_t was confusing. Adding a type called fomrobjectptr_t just adds to the confusion. It's too similarly named to omrobjectptr_t. In my head, they both sound like pointers to objects. I understand it's beyond the scope of what you're doing, but, it would be nice if we could choose clearer names.

I'll just throw out a suggestion:

Long term, does it make sense that we would stop using fomrobject_t directly? I don't see much point in holding on to an encoded field; you would always want to load+decode simultaneously, right? Maybe omr_ptr_field could be used internally by GC_SlotObject, but the majority of the GC would work with omrobjectptr_t or omr_field_addr's instead.

youngar commented 4 years ago

@rwy0717 There might be a couple places where you don't have to decode a field, for example checking if two pointers are equivalent (OpenJ9's ObjectAccessBarrier.cpp).

Edit: We can still just cast the value in a field to uintptr_t. Maybe we could add a type for documentation:

typedef uintptr_t omrobjecttoken;
gacholio commented 4 years ago

Are there in fact any downstream GC implementations other than OpenJ9?

rwy7 commented 4 years ago

I don't know for sure, perhaps there is a version of SOMpp that is still being maintained, but I'm not aware of any active users.

EDIT: I've been reminded by others (thanks!): there is also t4, and pypy integration in development.

youngar commented 4 years ago

I started to remove instances of fomrobject_t from OMR, here is what I have so far: https://github.com/eclipse/omr/pull/5033

In order to get this to work with runtime compressed references, I will have to replace sizeof(Slot) and sizeof(ObjectHeader) with something runtime. This will probably be a larger refactoring.

gacholio commented 4 years ago

Yeah, I haven't updated the example GC for mixed yet.

amicic commented 4 years ago

I have not put much thought on this, but so far i'm 1) in favor of the original proposal 2) not in favor of replacing specific types like fomrobject_t with generic types like uintptr_t

amicic commented 4 years ago

3) somewhat in favor of renaming fomrobject_t (numerous times I stopped for a few seconds asking myself what 'f' stands for)

rwy7 commented 4 years ago

How about this:

  1. introduce a new type: MM_SlotPtr. This will be a typedef to void *. You cannot dereference or do arithmetic on void * values.
  2. replace all usages of fomrobject_t * with MM_SlotPtr.
  3. Introduce a new type: MM_PtrToken. This will be a typedef to one of:
    • uint32_t (compressed mode)
    • uintptr_t (full mode)
    • uint64_t (mixed mode)
  4. delete the type fomrobject_t. Local fomrobject_t values will be replaced with MM_PtrToken values.
gacholio commented 4 years ago
You cannot dereference or do arithmetic on `void *` values.

I was under the impression that gcc allowed arithmetic on void*, so this may result in errors slipping in since we almost always use gcc for PR testing.

Why not leave it as opaque*?

   * `uint64_t` (mixed mode)

uintptr_t will do for mixed as well.

rwy7 commented 4 years ago

I was under the impression that gcc allowed arithmetic on void*, so this may result in errors slipping in since we almost always use gcc for PR testing.

Oh, you’re right, it’s only a warning on GCC 5 +.

Why not leave it as opaque*?

Mostly, because void* is a simpler solution. I’m more than happy to introduce an opaque type, since it will help catch more errors.

uintptr_t will do for mixed as well.

Yes, uintptr_t is the same type as uint64_t in mixed mode. I thought it might be more obvious to explicitly map each mode to a type. If you wrote the definition like this...

#if defined(OMR_FULL_POINTERS)
typedef uintptr_t MM_PtrToken;
#else
typedef uint32_t MM_PtrToken;
#endif

...I’d have to think about it for 1/4 of a second. It's fine, though.

So, with that in mind, new plan:

  1. Introduce a new type: MM_Slot. This will be an opaque type.
  2. Introduce a new type: MM_SlotPtr. This will be typedef'd to MM_Slot *.
  3. replace all usages of fomrobject_t * with MM_SlotPtr.
  4. Introduce a new type: MM_PtrToken. This will be a typedef to one of:
    • uint32_t (compressed mode)
    • uintptr_t (full and mixed mode)
  5. delete the type fomrobject_t. Local fomrobject_t values will be replaced with MM_PtrToken values.

I'm not sure we really need MM_SlotPtr, since we can just use MM_Slot *, which is obviously a slot ptr.

amicic commented 4 years ago

MM_ prefix is normally for Memory Managment (GC) code local types. These types would be visible across VM code.

gacholio commented 4 years ago

I'm not sure we really need MM_SlotPtr, since we can just use MM_Slot *, which is obviously a slot ptr.

Agreed.

rwy7 commented 4 years ago

OK, maybe omrobjectslot_t and omrobjecttoken_t, to match omrobjectptr_t?

amicic commented 4 years ago

I personally like omrobjectslot_t over fomrobject_t. 'f' (what stands for 'field'?) at the front moved toward the end and replaced with 'slot'.

I'm not quite following what omrobjecttoken_t would be: is it omrobjectslot_t *, but we want to replace it with a dedicated type?

Then, I prefer omrobjectslotptr_t, effectively what @gacholio suggested, (but like above, 'f' at the front moved toward the end and got replaced with 'slot').

This potential renaming is optional and orthogonal to @gacholio changes. He should proceed with his and then we may or may not decide to rename to (or somehow differently): fomrobject_t -> omrobjectslot_t fomrobjectptr_t -> omrobjectslotptr_t

rwy7 commented 4 years ago

This potential renaming is optional and orthogonal to @gacholio changes.

Yes I agree.

I'm not quite following what omrobjecttoken_t would be: is it omrobjectslot_t *, but we want to replace it with a dedicated type?

It would be a type thats wide enough to hold an object pointer that has been loaded, but not yet decoded. In places where we stack-allocate a j9object_t, we would instead use this type. After looking at the barrier code in J9, I very much doubt it would be useful.

For now, is it enough to simply change the example's definition of fomrobject_t to an incomplete struct?

youngar commented 4 years ago

So far I am using omrobjecttoken_t for one place in the GC, and I could probably get around the issue. There are probably more places that might be appropriate, but we tend to use uintptr_t for this.

omrobjecttoken_t fixupSlot = GC_SlotObject::convertTokenFromPointer(omrVM, reverseForwardedHeader.getReverseForwardedPointer());
forwardedHeader->restoreDestroyedOverlap((uint32_t)fixupSlot);

https://github.com/eclipse/omr/pull/5033/files#diff-6316fd673edb008a72e33e15e6b18065R211

gacholio commented 4 years ago

Was there any discussion of this at the tagged meeting?

youngar commented 4 years ago

No discussion really. Filip mentioned that he thinks there are (many?) instances of fj9object_t in the Java compiler.

gacholio commented 4 years ago

If there's an interest in downstream projects working in mixed mode, I think this (or something) is essential - far better to find problems at compile time than debug the runtime for days.

As the current proposal is a (relatively) simple search and replace, I can't see why having lots of instances would matter.

rwy7 commented 4 years ago

Would omr_opqaue_fomrobject_t and fomrobjectptr_t be glue types (defined by the application), like fomrobject_t and omrobjectptr_t currently are? Maybe fomrobjectptr_t should be defined by the OMR GC.

gacholio commented 4 years ago

I think OMR should define them, not the application.

gacholio commented 4 years ago

And please tell me you're not considering those horrid names - I already hate the _t convention - don't make me type 3 _ in a type name...