cesanta / v7

Embedded JavaScript engine for C/C++
Other
1.43k stars 177 forks source link

v7 tagged pointers vs Obj-C tagged pointers #533

Closed swisspol closed 8 years ago

swisspol commented 8 years ago

On 64 bits platforms, Obj-C uses tagged pointers to store numbers for instance - see https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html.

v7 also uses tag pointers.

If using v7_create_foreign() to wrap an Obj-C object that happens to be a tagged pointer, could there be any issue? Did you guys test this case?

swisspol commented 8 years ago

Obj-C uses the 4 bottom bits of a 64 bits object pointer to mark tag pointers because all object pointers are aligned on 16 bytes boundaries. If the object is a tag pointer, then the remaining 60 bits can be used to store arbitrary data.

I guess the question is when you call v7_create_foreign(), how many bits of these upper 60 bits are guaranteed to be preserved?

swisspol commented 8 years ago

Looking at v7.c it appears the top 16 bits will be overwritten... Not sure there's anything v7 could do about this, but why not have an assert in v7_create_foreign() to check that the upper 16 bits are clear?

mkmik commented 8 years ago

An assert wouldn't be a bad idea indeed.

swisspol commented 8 years ago

Any chance you could drop this in the next rev? It's a trivial change with no performance impact and since I'm writing some bindings from v7 to various languages, I'm concerned about truncation bugs on 64 bits platforms for foreign values.

Worse come to worse, if it turns out to be a real problem for various people in real life, you could always have a new type of foreign object, one that would store a true unmodified void*. The same way Lua has user data and light user data, the latter being just a void*. Or it could happen magically i.e. internally in v7 if passed a pointer with the upper 16 bits not zero.

Client-side, I guess I could always work around it today with some ugly wrapping like:

void* ptr = malloc(sizeof(void*));
*(void**)ptr = ...
v7_create_foreign(ptr);

Actually, I'd rather see a new v7 API called int v7_is_valid_foreign(void* ptr);, which would just check the upper 16 bits on 64 bit platforms. This way, I don't have to hardcode the way tag pointers work in v7 in my client code.

mkmik commented 8 years ago

BTW, did you read this: https://www.cesanta.com/developer/v7#_nan_packing ?

swisspol commented 8 years ago

Yep!

mkmik commented 8 years ago

did you you find any real world pointer that is not sign-extendable from a 48-bit value?

swisspol commented 8 years ago

Yes, in ObjC 64 bit, since it uses tagged pointers for some classes (numbers and strings whenever possible):

id temp = [NSNumber numberWithInt:0];  <--- Will most likely be a tagged pointer with its lower 4 bits non-zero and upper 60 bits arbitrary
v7_create_foreign((void*)temp);
mkmik commented 8 years ago

ah, sorry I only skim read the linked article only and concluded it was only about the classic low order bit tagging.

I located the place that made me interpret the article that way (emphasis his):

There are variations on this. For example, a tagged pointer could have any of the bottom four bits set. You can have even more tag bits by taking advantage of additional architecture peculiarities. For example, x86_64 actually only uses 48 bits of a pointer, leaving you with 16 bits free on top of the 4 you get due to alignment.

With the code I'm about to publish the last snippet you posted will cause an assertion check to fail if the objc runtime actually uses the high order bits of the pointer.

So I rephrase: do you know of any existing code that actually uses the upper 16 bits as the tag?

As far as I can see standard tags fit in the 4 low order bits http://www.opensource.apple.com/source/objc4/objc4-646/runtime/objc-internal.h

    OBJC_TAG_NSAtom            = 0, 
    OBJC_TAG_1                 = 1, 
    OBJC_TAG_NSString          = 2, 
    OBJC_TAG_NSNumber          = 3, 
    OBJC_TAG_NSIndexPath       = 4, 
    OBJC_TAG_NSManagedObjectID = 5, 
    OBJC_TAG_NSDate            = 6, 
    OBJC_TAG_7                 = 7

Is using the upper 16 bits even feasible without changing the objc runtime/compiler. i.e. is the excerpt I quoted just a theoretical remark or anybody can create and register a custom tag right now that uses the upper 16 bits?

swisspol commented 8 years ago

do you know of any existing code that actually uses the upper 16 bits as the tag?

I do not.

anybody can create and register a custom tag right now that uses the upper 16 bits?

You would need seriously patch and recompile the ObjC runtime, so that wouldn't make sense IMO.

I'm not sure I'm following you though: what's your concern about wether or not some language uses the upper 16 bits for tagging?

Isn't it as simple as either the 64 bit pointer has its upper 16 bits clean, or it doesn't, in which case you can't assume anything and must preserve all bits?

mkmik commented 8 years ago

I'm concerned about it in order to prioritise this.

It seems to me that currently a feature such as foreign pointers that allows any valid C pointer to be wrapped in a lightweight JS value is a useful thing, as long as we can ensure it works as expected for any actual valid pointer.

If objc id can also be treated as pointers to all practical effects that's even better.

If the user needs something like lua userdata (i.e. a managed malloced block) and store arbitrary bytes in it, we can discuss what's the best way to achieve that. But I don't think a storage area for arbitrary 8 bytes should be handled by "foreign pointers". Thus I don't really see the need for a v7_is_valid_foreign.

As for storing arbitrary data, this area is still open for improvement. We don't even provide a public API for GC "finalizers" but I guess we should have something like that, because it's not only about freeing the memory block, sometimes there are other things to be done as well.

swisspol commented 8 years ago

as long as we can ensure it works as expected for any actual valid pointer. But I don't think a storage area for arbitrary 8 bytes should be handled by "foreign pointers". Thus I don't really see the need for a v7_is_valid_foreign.

So I understand you correctly, you're saying that on x86_64, since the architecture specs say that only 48 bits can be used (?), then any pointer with its upper 16 bits not cleared can be considered invalid and therefore unsupported by v7 (and consequently there's no need for a v7_is_valid_foreign() API)?

What about arm64 BTW?

mkmik commented 8 years ago

Yes, we also checked the ARM64 specs. It seems that x86_64 and ARM64 are going to be the only two major architecture in the mid term future; if things change V7 will adapt.

This design is at the core of our object and value representations, so if it doesn't hold, foreign values are the least of our problems. BTW, afaik also spidermonkey (firefox js vm) uses (or at least used at some point) a similar approach, so we're not completely insane (nor visionary :-) ).

Minor nitpick: all zeros or all ones: the 48 bit quantity is sign extended.

swisspol commented 8 years ago

Oh I think your design is really good. Just gotta be careful at the boundaries with other languages which also attempt to be smart with pointers. In the meantime, just adding the assert() and leaving it around for some time to see who's impacted seems quite sufficient.

For instance, I'm not impacted AFAIK right now with my Obj-C bridge because I happen to natively bridge to v7 values the most used Obj-C classes that happen to use tagged pointers i.e. NSNumber and NSString. I don't care about NSIndexPath and NSManagedObjectID (CoreData), so that leaves NSDate and I'll cross that bridge when I get there (it'd be easy to natively bridge to the Date class anyway).

mkmik commented 8 years ago

ok, assert and doc about foreign pointers added in 096feb3c26f9dde97dea3df8d8ba4d6e97860de5