Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 557 forks source link

exchange value of $a,$b without $temporary variable #5788

Closed p5pRT closed 10 years ago

p5pRT commented 22 years ago

Migrated from rt.perl.org#15667 (status was 'resolved')

Searchable as RT15667$

p5pRT commented 22 years ago

From nospam-abuse@ilyaz.org

[A complimentary Cc of this posting was sent to Martien Verbruggen \mgjv@​tradingpost\.com\.au]\, who wrote in article \slrnak1205\.n5\.mgjv@​verbruggen\.comdyn\.com\.au​:

But why would you consider using anything but the ($a\, $b) = ($b\, $a) expression? It's clearly the clearest\, and will work for anything.

IIRC\, this construction (well\, a similar one) has subtle problems with aliasing present. Found this the hard way with File​::Find...

perl5.6.0 -wle 'BEGIN{ *a=\$a​::a; *b = \$b​::b } \   ($a\, $b) = (2\,3); print "$a $b"; \   ($a​::a\, $b​::b) = ($b\, $a); print "$a $b"' 2 3 3 3

I think this may be considered as a bug\, the parser has enough info to detect this...

Hope this helps\, Ilya

p5pRT commented 22 years ago

From @rgarcia

Ilya Zakharevich (via RT) wrote​:

IIRC\, this construction (well\, a similar one) has subtle problems with aliasing present. Found this the hard way with File​::Find...

perl5.6.0 -wle 'BEGIN{ *a=\$a​::a; *b = \$b​::b } \ ($a\, $b) = (2\,3); print "$a $b"; \ ($a​::a\, $b​::b) = ($b\, $a); print "$a $b"' 2 3 3 3

I think this may be considered as a bug\, the parser has enough info to detect this...

The bug is that the 2nd op aassign is not flagged as OPpASSIGN_COMMON as it should be.

I've a patch for this\, but it works only in the case where the aliasing is done via glob assignment "*a=\*a​::a; *b = \*b​::b;". The solution being to look at the effective GV (EGV) of the gv ops on each side of the assignment\, instead of only checking the plain GV as it's currently the case.

In other words : (this patch for illustrative purposes only)

Inline Patch ```diff --- op.c.orig Wed Jul 10 01:36:04 2002 +++ op.c Thu Aug 1 16:20:45 2002 @@ -3659,7 +3659,7 @@ for (curop = LINKLIST(o); curop != o; curop = LINKLIST(curop)) { if (PL_opargs[curop->op_type] & OA_DANGEROUS) { if (curop->op_type == OP_GV) { - GV *gv = cGVOPx_gv(curop); + GV *gv = GvEGV(cGVOPx_gv(curop)); if (gv == PL_defgv || (int)SvCUR(gv) == PL_generation) break; SvCUR(gv) = PL_generation; ```

End of example.

I don't know how to solve this in the general case\, as I have little knowledge about the mechanism of ref to glob assignment. Any hints ?

p5pRT commented 22 years ago

From @gbarr

On Thu\, Aug 01\, 2002 at 04​:39​:22PM +0200\, Rafael Garcia-Suarez wrote​:

Ilya Zakharevich (via RT) wrote​:

IIRC\, this construction (well\, a similar one) has subtle problems with aliasing present. Found this the hard way with File​::Find...

perl5.6.0 -wle 'BEGIN{ *a=\$a​::a; *b = \$b​::b } \ ($a\, $b) = (2\,3); print "$a $b"; \ ($a​::a\, $b​::b) = ($b\, $a); print "$a $b"' 2 3 3 3

I think this may be considered as a bug\, the parser has enough info to detect this...

The bug is that the 2nd op aassign is not flagged as OPpASSIGN_COMMON as it should be.

Right.

I've a patch for this\, but it works only in the case where the aliasing is done via glob assignment "*a=\*a​::a; *b = \*b​::b;".

And only if that glob assignment is actually performed before the assignment op is compiled (ie BEGIN). op.c is compile-time

This is difficult to solve in the general case. It would need runtime checks to be added into pp_aassign\, which would be costly in terms of performance.

Graham.

The solution being to look at the effective GV (EGV) of the gv ops on each side of the assignment\, instead of only checking the plain GV as it's currently the case.

In other words : (this patch for illustrative purposes only) --- op.c.orig Wed Jul 10 01​:36​:04 2002 +++ op.c Thu Aug 1 16​:20​:45 2002 @​@​ -3659\,7 +3659\,7 @​@​ for (curop = LINKLIST(o); curop != o; curop = LINKLIST(curop)) { if (PL_opargs[curop->op_type] & OA_DANGEROUS) { if (curop->op_type == OP_GV) { - GV *gv = cGVOPx_gv(curop); + GV *gv = GvEGV(cGVOPx_gv(curop)); if (gv == PL_defgv || (int)SvCUR(gv) == PL_generation) break; SvCUR(gv) = PL_generation; End of example.

I don't know how to solve this in the general case\, as I have little knowledge about the mechanism of ref to glob assignment. Any hints ?

p5pRT commented 22 years ago

From @rgarcia

Graham Barr wrote​:

And only if that glob assignment is actually performed before the assignment op is compiled (ie BEGIN). op.c is compile-time

This is difficult to solve in the general case. It would need runtime checks to be added into pp_aassign\, which would be costly in terms of performance.

Indeed. Ilya pointed this out in the original c.l.p.misc discussion. I don't claim to solve this once and for all; in most code\, aliasing is done at compile-time (variable exports). And usually it's not done by glob-to-glob assignment\, but by ref-to-glob assignment ; hence my question :

I don't know how to solve this in the general case\, as I have little knowledge about the mechanism of ref to glob assignment.

p5pRT commented 22 years ago

From @ysth

On Thu\, 01 Aug 2002 17​:59​:24 +0200\, raphel.garcia-suarez@​hexaflux.com wrote​:

Graham Barr wrote​:

And only if that glob assignment is actually performed before the assignment op is compiled (ie BEGIN). op.c is compile-time

This is difficult to solve in the general case. It would need runtime checks to be added into pp_aassign\, which would be costly in terms of performance.

Indeed. Ilya pointed this out in the original c.l.p.misc discussion. I don't claim to solve this once and for all; in most code\, aliasing is done at compile-time (variable exports). And usually it's not done by glob-to-glob assignment\, but by ref-to-glob assignment ; hence my question :

I don't know how to solve this in the general case\, as I have little knowledge about the mechanism of ref to glob assignment.

ref to glob assignment is handled just after the 2nd GV_UNIQUE_CHECK in sv_setsv_flags.

p5pRT commented 14 years ago

@chorny - Status changed from 'open' to 'stalled'

p5pRT commented 13 years ago

From ambrus@math.bme.hu

Created by ambrus@math.bme.hu

After the statements '*x=*y; @​x=8; @​y=@​x;'\, the array @​x should still contain a single element whose value is 8. In all perl versions I tried\, this leaves a single element in @​x that is undefined\, which is definitely an incorrect result. Further\, some older perl versions give a semi-panic warning.

$ perl5.12.2 -we '*x=*y; @​x=8; @​y=@​x; warn @​x;' semi-panic​: attempt to dup freed string at -e line 1. Use of uninitialized value within @​x in warn at -e line 1. Warning​: something's wrong at -e line 1. $ perl5.13.11 -we '*x=*y; @​x=8; @​y=@​x; warn @​x;' Use of uninitialized value $x[0] in warn at -e line 1. Warning​: something's wrong at -e line 1.

Perl 5.14.0RC1 still behaves the same as 5.13.11​: $x[0] becomes undefined\, but there's no semi-panic.

Thanks\,

Ambrus

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.12.3: Configured by ambrus at Tue Jan 25 14:12:12 CET 2011. Summary of my perl5 (revision 5 version 12 subversion 3) configuration: Platform: osname=linux, osvers=2.6.34.1, archname=x86_64-linux uname='linux king 2.6.34.1 #1 smp sat jul 10 18:21:56 cest 2010 x86_64 gnulinux ' config_args='-Dinc_version_list=5.12.2/x86_64-linux 5.12.2 5.12.1/x86_64-linux 5.12.1 5.12.0/x86_64-linux 5.12.0 -d' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.5.1', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.7.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.7' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.12.3: /usr/local/lib/perl5/site_perl/5.12.3/x86_64-linux /usr/local/lib/perl5/site_perl/5.12.3 /usr/local/lib/perl5/5.12.3/x86_64-linux /usr/local/lib/perl5/5.12.3 /usr/local/lib/perl5/site_perl/5.12.2/x86_64-linux /usr/local/lib/perl5/site_perl/5.12.2 /usr/local/lib/perl5/site_perl/5.12.1/x86_64-linux /usr/local/lib/perl5/site_perl/5.12.1 /usr/local/lib/perl5/site_perl/5.12.0/x86_64-linux /usr/local/lib/perl5/site_perl/5.12.0 /usr/local/lib/perl5/site_perl . Environment for perl 5.12.3: HOME=/home/ambrus LANG (unset) LANGUAGE (unset) LC_CTYPE=hu_HU LD_LIBRARY_PATH=/home/ambrus/local/lib/ LOGDIR (unset) PATH=/home/ambrus/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games PERL_BADLANG (unset) SHELL=/usr/local/bin/bash ```
p5pRT commented 13 years ago

From @cpansprout

On Sun May 01 14​:05​:56 2011\, b_jonas wrote​:

After the statements '*x=*y; @​x=8; @​y=@​x;'\, the array @​x should still contain a single element whose value is 8. In all perl versions I tried\, this leaves a single element in @​x that is undefined\, which is definitely an incorrect result. Further\, some older perl versions give a semi-panic warning.

$ perl5.12.2 -we '*x=*y; @​x=8; @​y=@​x; warn @​x;' semi-panic​: attempt to dup freed string at -e line 1. Use of uninitialized value within @​x in warn at -e line 1. Warning​: something's wrong at -e line 1.

This is the no-common-vars optimisation (aka the absence of the common-vars pessimisation).

In ($a\,$b) = ($b\,$a) assignments\, perl has to copy the RHS into temporary scalars before assigning to the LHS.

That can be *really* slow. So perl avoids doing it if the variables named on either side are all different.

But one can easily fool perl with glob assignments. (If we add support for lexical aliases\, then the problem will be compounded.)

If the stack were refcounted\, fixing this would be easy (just check the refcount in pp_aassign to see whether copying is necessary).

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @cpansprout

On Sat Jul 27 05​:06​:39 2002\, nospam-abuse@​ilyaz.org wrote​:

[A complimentary Cc of this posting was sent to Martien Verbruggen \mgjv@​tradingpost\.com\.au]\, who wrote in article \slrnak1205\.n5\.mgjv@​verbruggen\.comdyn\.com\.au​:

But why would you consider using anything but the ($a\, $b) = ($b\, $a) expression? It's clearly the clearest\, and will work for anything.

IIRC\, this construction (well\, a similar one) has subtle problems with aliasing present. Found this the hard way with File​::Find...

perl5.6.0 -wle 'BEGIN{ *a=\$a​::a; *b = \$b​::b } \ ($a\, $b) = (2\,3); print "$a $b"; \ ($a​::a\, $b​::b) = ($b\, $a); print "$a $b"' 2 3 3 3

I think this may be considered as a bug\, the parser has enough info to detect this...

Hope this helps\,

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

--

Father Chrysostomos

p5pRT commented 10 years ago

The RT System itself - Status changed from 'stalled' to 'open'

p5pRT commented 10 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 10 years ago

From @bulk88

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 10 years ago

From @cpansprout

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\,

On 64-bit systems\, it fills in an alignment hole (assuming line_t is 32 bits).

On 32-bit systems\, the size goes from 44 to 48 bytes. On Windows memory is allocated in 8-byte chunks;¹ on 32-bit darwin in 16-byte chunks. Are there any systems that mallocate 4-byte chunks any more? If not\, then this makes no difference.

Even so\, it was partly to offset any possible increase in memory usage from this bug’s fix that I stopped many CVs from having to have GVs to live in.

¹ This is based on a message that you posted when we were discussing COW techniques.

why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

I’m not necessarily opposed\, though I’m not sure I feel comfortable changing the way we use line_t\, either.

If you can prove me wrong about malloc sizes\, I’ll probably go ahead and follow your suggestion.

--

Father Chrysostomos

p5pRT commented 10 years ago

From @bulk88

On Thu Sep 18 22​:26​:26 2014\, sprout wrote​:

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\,

On 64-bit systems\, it fills in an alignment hole (assuming line_t is 32 bits).

On 32-bit systems\, the size goes from 44 to 48 bytes. On Windows memory is allocated in 8-byte chunks;¹ on 32-bit darwin in 16-byte chunks. Are there any systems that mallocate 4-byte chunks any more? If not\, then this makes no difference.

If in the future\, GPs are allocated as arenas\, 4 bytes on 32 bit OSes would definitely matter.

Even so\, it was partly to offset any possible increase in memory usage from this bug’s fix that I stopped many CVs from having to have GVs to live in.

¹ This is based on a message that you posted when we were discussing COW techniques.

why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

I’m not necessarily opposed\, though I’m not sure I feel comfortable changing the way we use line_t\, either.

If you can prove me wrong about malloc sizes\, I’ll probably go ahead and follow your suggestion.

Here is the table of request alloc size\, vs total size (including header+unused but reserved anyway spare bytes at end) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=114820#txn-1159502

If Perl DIDNT add any extra headers on Win32\, you would be correct that 44 (0x2C) and 48 (0x30) make no difference.

ptr=3324C8 i=2C _msize=2C RevEngd req sz=2C Actual=38

ptr=3324C8 i=30 _msize=30 RevEngd req sz=30 Actual=38 ptr=332500 i=31 _msize=31 RevEngd req sz=31 Actual=40

at malloc(0x31/49)\, it goes into another bucket with more slack space.

But Perl on Win32 adds a header to each allocation.

Tracing/stepping the old GP code\,

Initial req in newGP\, 0x2c in VMem​::Malloc\, 0xC is added\, now at 0x38

enters OS's native allocator HeapAlloc (which is aliased to RtlAllocateHeap) as request size 0x38

Dump of C stack args on entry to RtlAllocateHeap

0x0012FAF8 77c2c3c9 00360000 00000000 00000038 00365f38 ÉÃÂw..6.....8...8_6.

(Return Address) (HANDLE hHeap) (DWORD dwFlags) (SIZE_T dwBytes) (not an arg\, unknown C auto)

  ntdll.dll!_RtlAllocateHeap@​12()
  msvcrt.dll!__nh_malloc() + 0x13
  msvcrt.dll!_malloc() + 0x27
  perl521.dll!VMem​::Malloc(unsigned int size=44) Line 151 + 0xe C++   perl521.dll!PerlMemCalloc(IPerlMem * piPerl=0x00365bec\, unsigned int num=1\, unsigned int size=44) Line 313 + 0x1b C++   perl521.dll!Perl_safesyscalloc(unsigned int count=1\, unsigned int size=44) Line 434 + 0x6 C   perl521.dll!Perl_newGP(interpreter * my_perl=0x00364624\, gv * const gv=0x0036813c) Line 184 C   perl521.dll!Perl_gv_init_pvn(interpreter * my_perl=0x00364624\, gv * gv=0x0036813c\, hv * stash=0x0036811c\, const char * name=0x280d38c0\, unsigned int len=6\, unsigned long flags=0) Line 413 C   perl521.dll!Perl_gv_fetchpvn_flags(interpreter * my_perl=0x00364624\, const char * nambeg=0x280d38c0\, unsigned int full_len=671955142\, long flags=6\, const int sv_type=12) Line 2301 C   perl521.dll!S_init_main_stash(interpreter * my_perl=0x00364624) Line 3609 + 0x1b C   perl521.dll!S_parse_body(interpreter * my_perl=0x00364624\, char * * env=0x00362a08\, void (interpreter *)* xsinit=0x280c0070) Line 1830 C   perl521.dll!perl_parse(interpreter * my_perl=0x00364624\, void (interpreter *)* xsinit=0x280c0070\, int argc=3\, char * * argv=0x00362470\, char * * env=0x00362a08) Line 1604 C   perl521.dll!RunPerl(int argc=3\, char * * argv=0x00362470\, char * * env=0x01362d70) Line 251 + 0x12 C++   perl.exe!main(int argc=3\, char * * argv=0x00362470\, char * * env=0x00362d70) Line 22 + 0x12 C   perl.exe!_mainCRTStartup() + 0xe3
  kernel32.dll!_BaseProcessStart@​4() + 0x23

A req of 0x38 takes a 0x40 slot.

ptr=332500 i=38 _msize=38 RevEngd req sz=38 Actual=40

But a req of 0x3c\, it takes a 0x48 slot\, a memory increase of 8 bytes.

ptr=332540 i=3C _msize=3C RevEngd req sz=3C Actual=48

If Win32 Perl's malloc code is rewritten (I might do it one day)\, then there wont be a perl added header\, but its not on my immediate todo list.

In any case\, unlike other interpreted programing language engines\, Perl never steals always 0 bits from pointers or other fields\, the webbrowser you are reading this with\, does steal bits from pointers.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 10 years ago

From perl5-porters@perl.org

Daniel Dragan wrote​:

But a req of 0x3c\, it takes a 0x48 slot\, a memory increase of 8 bytes.

I stand corrected.

In any case\, unlike other interpreted programing language engines\, Perl never steals always 0 bits from pointers or other fields\, the webbrowser you are reading this with\, does steal bits from pointers.

Is there any way to grep for that in the lynx source?

p5pRT commented 10 years ago

From @cpansprout

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

Can I assume that line_t is 32 bits everywhere?

Or should I use U32 for the bitfield?

--

Father Chrysostomos

p5pRT commented 10 years ago

From @bulk88

On Fri Sep 19 22​:21​:27 2014\, sprout wrote​:

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

Can I assume that line_t is 32 bits everywhere?

Or should I use U32 for the bitfield?

What is wrong with using a http​://www.tutorialspoint.com/cprogramming/c_bit_fields.htm ?

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 10 years ago

From @rurban

On Fri\, Sep 19\, 2014 at 12​:06 AM\, bulk88 via RT \perlbug\-followup@​perl\.org wrote​:

ff2a62e0c

And do we really have to endure such enormous binary ABI changes directly into blead without any previous discussion and testing in a branch?

Please revert and move to a branch -- Reini Urban http​://cpanel.net/ http​://www.perl-compiler.org/

p5pRT commented 10 years ago

From @cpansprout

On Sat Sep 20 07​:10​:21 2014\, bulk88 wrote​:

On Fri Sep 19 22​:21​:27 2014\, sprout wrote​:

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

Can I assume that line_t is 32 bits everywhere?

Or should I use U32 for the bitfield?

What is wrong with using a http​://www.tutorialspoint.com/cprogramming/c_bit_fields.htm ?

Let me rephrase the question​: If line_t is some size other than 32 bits on an exotic compiler or platform\, then is it safe to do this?

  line_t​:31 gp_line;   line_t​:1 gp_flags;

--

Father Chrysostomos

p5pRT commented 10 years ago

From @bulk88

On Fri Sep 19 16​:09​:19 2014\, perl5-porters@​perl.org wrote​:

Daniel Dragan wrote​:

But a req of 0x3c\, it takes a 0x48 slot\, a memory increase of 8 bytes.

I stand corrected.

In any case\, unlike other interpreted programing language engines\, Perl never steals always 0 bits from pointers or other fields\, the webbrowser you are reading this with\, does steal bits from pointers.

Is there any way to grep for that in the lynx source?

I downloaded some source and searched it\,

Lynxs\, no Links\, no

After I wrote the below\, I found someone already wrote this up in http​://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations

Mozilla Spidermonkey (I've written C code for it before)\, yes

Spidermonkey's JSValue is passed by copy\, as a 4 byte int. The 4 byte data unit contain primative ints or primative bools that are not GCed\, or a GCed pointer to something. I am using Spidermonkey 1.8 from March 2009. After Spidermonkey ~1.8/Firefox 3.*\, when Mozilla went into rapid release cycle\, Mozilla made a decision to drop support Mozilla embedder/API stability\, and Spidermonkeys devs decided to rewrite the engine in C++ to be able to hang out with the cool kids on the school yard\, so I have no idea and dont care what the lastest Spidermonkey is doing since I can't use it for my internal use.


/* * Type tags stored in the low bits of a jsval. */ #define JSVAL_OBJECT 0x0 /* untagged reference to object */ #define JSVAL_INT 0x1 /* tagged 31-bit integer value */ #define JSVAL_DOUBLE 0x2 /* tagged reference to double */ #define JSVAL_STRING 0x4 /* tagged reference to string */ #define JSVAL_BOOLEAN 0x6 /* tagged boolean value */

/* Type tag bitfield length and derived macros. */ #define JSVAL_TAGBITS 3 #define JSVAL_TAGMASK JS_BITMASK(JSVAL_TAGBITS) #define JSVAL_TAG(v) ((v) & JSVAL_TAGMASK) #define JSVAL_SETTAG(v\,t) ((v) | (t)) #define JSVAL_CLRTAG(v) ((v) & ~(jsval)JSVAL_TAGMASK) #define JSVAL_ALIGN JS_BIT(JSVAL_TAGBITS)

/* Predicates for type testing. */ #define JSVAL_IS_OBJECT(v) (JSVAL_TAG(v) == JSVAL_OBJECT) #define JSVAL_IS_NUMBER(v) (JSVAL_IS_INT(v) || JSVAL_IS_DOUBLE(v)) #define JSVAL_IS_INT(v) (((v) & JSVAL_INT) && (v) != JSVAL_VOID) #define JSVAL_IS_DOUBLE(v) (JSVAL_TAG(v) == JSVAL_DOUBLE) #define JSVAL_IS_STRING(v) (JSVAL_TAG(v) == JSVAL_STRING) #define JSVAL_IS_BOOLEAN(v) (JSVAL_TAG(v) == JSVAL_BOOLEAN) #define JSVAL_IS_NULL(v) ((v) == JSVAL_NULL) #define JSVAL_IS_VOID(v) ((v) == JSVAL_VOID) #define JSVAL_IS_PRIMITIVE(v) (!JSVAL_IS_OBJECT(v) || JSVAL_IS_NULL(v))

/* Objects\, strings\, and doubles are GC'ed. */ #define JSVAL_IS_GCTHING(v) (!((v) & JSVAL_INT) && !JSVAL_IS_BOOLEAN(v)) #define JSVAL_TO_GCTHING(v) ((void *)JSVAL_CLRTAG(v)) #define JSVAL_TO_OBJECT(v) ((JSObject *)JSVAL_TO_GCTHING(v)) #define JSVAL_TO_DOUBLE(v) ((jsdouble *)JSVAL_TO_GCTHING(v)) #define JSVAL_TO_STRING(v) ((JSString *)JSVAL_TO_GCTHING(v)) #define OBJECT_TO_JSVAL(obj) ((jsval)(obj)) #define DOUBLE_TO_JSVAL(dp) JSVAL_SETTAG((jsval)(dp)\, JSVAL_DOUBLE) #define STRING_TO_JSVAL(str) JSVAL_SETTAG((jsval)(str)\, JSVAL_STRING)


I am not familiar with Google V8\, but there are some examples of tagged pointers and it seems to be the same concept as in Spidermonkey except with a large scoop of OOP obfuscation. The core unit is 4 bytes\, and is a primitive int or a GCed/heap *.


Smi* Smi​::FromIntptr(intptr_t value) {   DCHECK(Smi​::IsValid(value));   int smi_shift_bits = kSmiTagSize + kSmiShiftSize;   return reinterpret_cast\<Smi*>((value \<\< smi_shift_bits) | kSmiTag); }


#define HAS_SMI_TAG(value) \   ((reinterpret_cast\<intptr_t>(value) & kSmiTagMask) == kSmiTag)


bool Object​::IsSmi() const {   return HAS_SMI_TAG(this); }


bool Object​::IsHeapObject() const {   return Internals​::HasHeapObjectTag(this); }


  V8_INLINE static bool HasHeapObjectTag(const internal​::Object* value) {   return ((reinterpret_cast\<intptr_t>(value) & kHeapObjectTagMask) ==   kHeapObjectTag);   }


// Tag information for HeapObject. const int kHeapObjectTag = 1; const int kHeapObjectTagSize = 2; const intptr_t kHeapObjectTagMask = (1 \<\< kHeapObjectTagSize) - 1;


class Smi​: public Object { public​:   // Returns the integer value.   inline int value() const;

  // Convert a value to a Smi object.   static inline Smi* FromInt(int value);


int Smi​::value() const {   return Internals​::SmiValue(this); }


class Internals { public​:   // These values match non-compiler-dependent values defined within   // the implementation of v8. .................

  V8_INLINE static int SmiValue(const internal​::Object* value) {   return PlatformSmiTagging​::SmiToInt(value);   }


// Smi constants for 32-bit systems. template \<> struct SmiTagging\<4> {   enum { kSmiShiftSize = 0\, kSmiValueSize = 31 };   static int SmiShiftSize() { return kSmiShiftSize; }   static int SmiValueSize() { return kSmiValueSize; }   V8_INLINE static int SmiToInt(const internal​::Object* value) {   int shift_bits = kSmiTagSize + kSmiShiftSize;   // Throw away top 32 bits and shift down (requires >> to be sign extending).   return static_cast\(reinterpret_cast\<intptr_t>(value)) >> shift_bits;   }


// // Most object types in the V8 JavaScript are described in this file. // // Inheritance hierarchy​: // - Object // - Smi (immediate small integer) // - HeapObject (superclass for everything allocated in the heap) // - JSReceiver (suitable for property access) // - JSObject


// Formats of Object*​: // Smi​: [31 bit signed int] 0 // HeapObject​: [32 bit direct pointer] (4 byte aligned) | 01


// Smi represents integer Numbers that can be stored in 31 bits. // Smis are immediate which means they are NOT allocated in the heap. // The this pointer has the following format​: [31 bit signed int] 0 // For long smis it has the following format​: // [32 bit signed int] [31 bits zero padding] 0 // Smi stands for small integer.


Now for Webkit/Safari\, and answer is yes on tagged pointers.

Webkit steals HIGH bits on 64 bit OSes\, not low 2 or 3 bits of the pointer like Spidermonkey and V8. All Webkit boxed pointers are always double NAN. !!isnan(*(double*)webkit_boxed_val_that_is_a_boxed_ptr) == 1

Webkit uses a 8 byte long union\, which I think is pass by copied\, and the union seems to be limiting memory space to 2/4GB even on 64 bit OSes. The high 4 bytes are info on how to interpret the union. Low 4 bytes are the pointer if the boxed data unit is a pointer. A SO post conflicts with the code below which is from WebKit @​ r173798\, and the SO post says that JSC uses 52 bit pointers on 64 bit OSes http​://stackoverflow.com/questions/17698605/how-to-overcome-javascripts-56-bit-limitation . While I see support for Int52 format for JSC JSValue *s\, I dont see it being used to store JSObject *s. So Webkit/Safari is not 64 bit clean at all (is that a feature or a limitation? ;-) ).


ALWAYS_INLINE int32_t JSValue​::toInt32(ExecState* exec) const {   if (isInt32())   return asInt32();   return JSC​::toInt32(toNumber(exec)); }


inline bool JSValue​::isInt32() const {   return (u.asInt64 & TagTypeNumber) == TagTypeNumber; }


inline int32_t JSValue​::asInt32() const {   ASSERT(isInt32());   return u.asBits.payload; }


class JSValue { private​: ..................   EncodedValueDescriptor u; };


union EncodedValueDescriptor {   int64_t asInt64; #if USE(JSVALUE32_64)   double asDouble; #elif USE(JSVALUE64)   JSCell* ptr; #endif  
#if CPU(BIG_ENDIAN)   struct {   int32_t tag;   int32_t payload;   } asBits; #else   struct {   int32_t payload;   int32_t tag;   } asBits; #endif };


inline JSObject* JSValue​::getObject() const {   return isCell() ? asCell()->getObject() : 0; }


ALWAYS_INLINE JSCell* JSValue​::asCell() const {   ASSERT(isCell());   return reinterpret_cast\<JSCell*>(u.asBits.payload); }


inline bool JSValue​::isCell() const {   return tag() == CellTag; }


inline uint32_t JSValue​::tag() const {   return u.asBits.tag; }


inline JSValue​::JSValue(const JSCell* ptr) {   if (ptr)   u.asBits.tag = CellTag;   else   u.asBits.tag = EmptyValueTag;   u.asBits.payload = reinterpret_cast\<int32_t>(const_cast\<JSCell*>(ptr)); }


#if USE(JSVALUE32_64)   /*   * On 32-bit platforms USE(JSVALUE32_64) should be defined\, and we use a NaN-encoded   * form for immediates.   *   * The encoding makes use of unused NaN space in the IEEE754 representation. Any value   * with the top 13 bits set represents a QNaN (with the sign bit set). QNaN values   * can encode a 51-bit payload. Hardware produced and C-library payloads typically   * have a payload of zero. We assume that non-zero payloads are available to encode   * pointer and integer values. Since any 64-bit bit pattern where the top 15 bits are   * all set represents a NaN with a non-zero payload\, we can use this space in the NaN   * ranges to encode other values (however there are also other ranges of NaN space that   * could have been selected).   *   * For JSValues that do not contain a double value\, the high 32 bits contain the tag   * values listed in the enums below\, which all correspond to NaN-space. In the case of   * cell\, integer and bool values the lower 32 bits (the 'payload') contain the pointer   * integer or boolean value; in the case of all other tags the payload is 0.   */


inline bool isInt52(double number) {   return tryConvertToInt52(number) != JSValue​::notInt52; }


inline int64_t tryConvertToInt52(double number) {   if (number != number)   return JSValue​::notInt52; #if OS(WINDOWS) && CPU(X86)   // The VS Compiler for 32-bit builds generates a floating point error when attempting to cast   // from an infinity to a 64-bit integer. We leave this routine with the floating point error   // left in a register\, causing undefined behavior in later floating point operations.   //   // To avoid this issue\, we check for infinity here\, and return false in that case.   if (std​::isinf(number))   return JSValue​::notInt52; #endif   int64_t asInt64 = static_cast\<int64_t>(number);   if (asInt64 != number)   return JSValue​::notInt52;   if (!asInt64 && std​::signbit(number))   return JSValue​::notInt52;   if (asInt64 >= (static_cast\<int64_t>(1) \<\< (JSValue​::numberOfInt52Bits - 1)))   return JSValue​::notInt52;   if (asInt64 \< -(static_cast\<int64_t>(1) \<\< (JSValue​::numberOfInt52Bits - 1)))   return JSValue​::notInt52;   return asInt64; }


Internet Explorer 6\, no tagged pointers. Pointer can be all 2^32 or 2^64 permutations. Circa 2001 or 2008 design depending on how you count it. Maybe its even a Win16 design :D

IE is obviously closed source\, so these comes from REing. JScript.dll version 5.7 from 2008 shows primatives/boxeds being 16 bytes units that are passed by copy. First/low 4 bytes in asm (2 bytes on paper) are the tag/type code. The primative double lives in the upper 8 bytes of the union. This identical to MS OLE/VB/COM VARIANTs which are http​://msdn.microsoft.com/en-us/library/windows/desktop/ms221627%28v=vs.85%29.aspx or see http​://en.wikipedia.org/wiki/Variant_type . This makes IE6 JS's basic type being the largest of all 4 browsers with JS compared in this post\, at 16 bytes on a 32 bit machine. This is identical to Perl's SV and 2 or 4 byte flag determining type. So why are we coding as if its the days of Disco? and why is P5's design looks like bell bottoms?

Partial answers​: I know P1 is from 1987\, P5 broke API compatibility with P4 in 1993\, but SV API is a lil better than P4's 24 byte STR struct\, it isn't the 8 or 4 bytes of every competitor except MS. I dont see any reason why a 4 or 8 byte primative/boxed type couldn't have been used from 1993 onwards with P5.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 10 years ago

From @bulk88

500px-Disco_Dancers.svg.png

p5pRT commented 10 years ago

From @cpansprout

On Sat Sep 20 09​:34​:43 2014\, sprout wrote​:

Let me rephrase the question​: If line_t is some size other than 32 bits on an exotic compiler or platform\, then is it safe to do this?

line_t​:31 gp_line; line_t​:1 gp_flags;

Oops\, the :n is in the wrong spot.

I decided to go with U32\, which I did in 39ff6c37.

--

Father Chrysostomos

p5pRT commented 10 years ago

From @jandubois

On Mon\, Sep 22\, 2014 at 9​:57 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Sep 20 09​:34​:43 2014\, sprout wrote​:

Let me rephrase the question​: If line_t is some size other than 32 bits on an exotic compiler or platform\, then is it safe to do this?

line_t​:31 gp_line; line_t​:1 gp_flags;

Oops\, the :n is in the wrong spot.

I decided to go with U32\, which I did in 39ff6c37.

For consistency this should probably be PERL_BITFIELD32 instead of U32\, which will be just "unsigned" everywhere but Win32.

Cheers\, -Jan

p5pRT commented 10 years ago

From @cpansprout

On Tue Sep 23 09​:38​:32 2014\, jdb wrote​:

On Mon\, Sep 22\, 2014 at 9​:57 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Sat Sep 20 09​:34​:43 2014\, sprout wrote​:

Let me rephrase the question​: If line_t is some size other than 32 bits on an exotic compiler or platform\, then is it safe to do this?

line_t​:31 gp_line; line_t​:1 gp_flags;

Oops\, the :n is in the wrong spot.

I decided to go with U32\, which I did in 39ff6c37.

For consistency this should probably be PERL_BITFIELD32 instead of U32\, which will be just "unsigned" everywhere but Win32.

Thank you. That was the answer I was looking for. I have changed it in commit e12ab2f.

--

Father Chrysostomos

p5pRT commented 10 years ago

From @khwilliamson

On 09/20/2014 08​:10 AM\, bulk88 via RT wrote​:

On Fri Sep 19 22​:21​:27 2014\, sprout wrote​:

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

Can I assume that line_t is 32 bits everywhere?

Or should I use U32 for the bitfield?

What is wrong with using a http​://www.tutorialspoint.com/cprogramming/c_bit_fields.htm ?

So why doesn't Perl use C language bit fields in general? It's easier code to maintain than all the masking and logical operations. And it's possible the compiler can do better optimization on them.

I just presumed that bit fields were from C99 and that's why we didn't use them; or were implemented poorly in some compilers. But in investigating\, I see that they are already in the version 1 K&R\, pre ANSI.

p5pRT commented 10 years ago

From ambrus@math.bme.hu

On 9/24/14\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

So why doesn't Perl use C language bit fields in general? It's easier code to maintain than all the masking and logical operations. And it's possible the compiler can do better optimization on them.

As far as I can see\, bit fields are not generally worth to use in any C program. The problem is that often you want to access or modify or even copy multiple flags at the same time\, and then the code with bit-fields gets long-winded. And if your code happens to be such that you always access only a single flag at the same time\, then the code is simple and easy enough to read with just a normal integer field and bit operations anyway.

Ambrus

p5pRT commented 10 years ago

From @jhi

On Wednesday-201409-24\, 1​:46\, Zsbán Ambrus wrote​:

On 9/24/14\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

So why doesn't Perl use C language bit fields in general? It's easier code to maintain than all the masking and logical operations. And it's possible the compiler can do better optimization on them.

As far as I can see\, bit fields are not generally worth to use in any C program. The problem is that often you want to access or modify or even copy multiple flags at the same time\, and then the code with bit-fields gets long-winded. And if your code happens to be such that you always access only a single flag at the same time\, then the code is simple and easy enough to read with just a normal integer field and bit operations anyway.

I have not much experience on actually trying to use bit fields\, but I've always shied away from them because of generally bad reputation.

After a google-aided refresher​: extremely unportable in that the compilers are very\, very much allowed to do whatever underneath​: the alignment\, padding\, and amount of storage (and even endianness) used can absolutely not be relied upon. So what they allow (handling chunks of boolean bits) is pretty much all they allow. That's why the usual \<\< >> & | ^ ~ dance.

p5pRT commented 10 years ago

From @doughera88

On Tue\, Sep 23\, 2014 at 10​:45​:31PM -0600\, Karl Williamson wrote​:

On 09/20/2014 08​:10 AM\, bulk88 via RT wrote​:

On Fri Sep 19 22​:21​:27 2014\, sprout wrote​:

On Thu Sep 18 22​:06​:40 2014\, bulk88 wrote​:

On Thu Sep 18 20​:07​:16 2014\, sprout wrote​:

Twelve years after the report\, this is finally fixed\, in commit ff2a62e0c.

I think this is a bad implementation. Instead of adding another 4 bytes to every GV\, why not turn gp_line into a bitfield (C bitfield or bitwise op synthesized)\, top bit is the GPf_ALIASED_SV flag. Do we need to support over 2 billion lines of perl source code in 1 file?

Can I assume that line_t is 32 bits everywhere?

Or should I use U32 for the bitfield?

What is wrong with using a http​://www.tutorialspoint.com/cprogramming/c_bit_fields.htm ?

So why doesn't Perl use C language bit fields in general? It's easier code to maintain than all the masking and logical operations. And it's possible the compiler can do better optimization on them.

I just presumed that bit fields were from C99 and that's why we didn't use them; or were implemented poorly in some compilers. But in investigating\, I see that they are already in the version 1 K&R\, pre ANSI.

I don't recall any specific discussion of that issue\, but I do recall that compilers of the day didn't necessarily handle structs as well as they handled integers. For example\, you couldn't simply copy structs by assignment. (Indeed\, Configure still tests for that even today!)

None of that really matters much today\, however\, except for a taste for consistency.

--   Andy Dougherty doughera@​lafayette.edu