Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

fstat st_size overflow #810

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1738 (status was 'resolved')

Searchable as RT1738$

p5pRT commented 24 years ago

From chris@gouda.netmonger.net

$ ls -l log.dat -rw-rw-r-- 1 chris web 2200944678 Nov 2 18​:16 log.dat $ perl -le 'print -s "log.dat"' -2094022618

I'm afraid I don't have enough background to know what the correct solution is. On FreeBSD\, st_size is an off_t\, which is a "long long"\, 64-bit signed int. I don't know what a negative file size could be\, but I guess it's probably just an unintended consequence of using the same type as lseek.

In any case\, I have some code that needs to know the right size in order to do a look-style binary search. Being somewhat unfamiliar with Perl guts\, I just did this​:

--- pp_sys.c 1999/05/02 14​:18​:18 1.1.1.2 +++ pp_sys.c 1999/11/03 20​:53​:50 @​@​ -2220\,7 +2220\,7 @​@​ #else   PUSHs(sv_2mortal(newSVpv(""\, 0))); #endif - PUSHs(sv_2mortal(newSViv((I32)PL_statcache.st_size))); + PUSHs(sv_2mortal(newSVnv((U32)PL_statcache.st_size))); #ifdef BIG_TIME   PUSHs(sv_2mortal(newSVnv((U32)PL_statcache.st_atime)));   PUSHs(sv_2mortal(newSVnv((U32)PL_statcache.st_mtime))); @​@​ -2349\,7 +2349\,7 @​@​   djSP; dTARGET;   if (result \< 0)   RETPUSHUNDEF; - PUSHi(PL_statcache.st_size); + PUSHn(PL_statcache.st_size);   RETURN; }

I'm not sure if that's right even as a bandaid\, and it will obviously fail again after another 2GB. Further\, I suspect this issue is already well-known\, but in the hopes that I won't have to maintain local modifications to Perl in order to keep this program working\, I am sending this report.

Perl Info ``` Site configuration information for perl 5.00503: Configured by markm at $Date$. Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=freebsd, osvers=4.0-current, archname=i386-freebsd uname='freebsd freefall.freebsd.org 4.0-current freebsd 4.0-current #0: $Date$' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc', optimize='undef', gccversion=egcs-2.91.66 19990314 (egcs-1.1.2 release) cppflags='' ccflags ='' stdchar='char', d_stdstdio=undef, usevfork=true intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags ='-Wl,-E' libpth=/usr/lib libs=-lm -lc -lcrypt libc=/usr/lib/libc.so, so=so, useshrplib=true, libperl=libperl.so.3 Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-DPIC -fpic', lddlflags='-shared' Locally applied patches: @INC for perl 5.00503: /usr/libdata/perl/5.00503/mach /usr/libdata/perl/5.00503 /usr/local/lib/perl5/site_perl/5.005/i386-freebsd /usr/local/lib/perl5/site_perl/5.005 . Environment for perl 5.00503: HOME=/home/chris LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/home/chris/bin PERL_BADLANG (unset) SHELL=/usr/local/bin/zsh ```
p5pRT commented 24 years ago

From @jhi

Christopher Masto writes​:

I'm afraid I don't have enough background to know what the correct solution is. On FreeBSD\, st_size is an off_t\, which is a "long long"\, 64-bit signed int. I don't know what a negative file size could be\, but I guess it's probably just an unintended consequence of using the same type as lseek.

The next major release of Perl\, called 5.6\, will be able to handle large files (>2GB) (and >4GB). Currently it seems that one must Configure and build perl separately for that\, though--largefile awareness won't be on default because of backward compatibility.

If you want and you have the time you can try the latest developer release​:

  http​://www.cpan.org/src/5.0/perl5.005_62.tar.gz

Do "./Configure -Duselargefiles -de;make all test" BUT DO NOT INSTALL into production use because this *is* a developer release. You can try your test with the resulting 'perl' executable\, though.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

On Thu\, Nov 04\, 1999 at 02​:42​:20AM +0200\, Jarkko Hietaniemi wrote​:

If you want and you have the time you can try the latest developer release​:

http&#8203;://www\.cpan\.org/src/5\.0/perl5\.005\_62\.tar\.gz

Do "./Configure -Duselargefiles -de;make all test" BUT DO NOT INSTALL into production use because this *is* a developer release. You can try your test with the resulting 'perl' executable\, though.

Just for the record\, 5.005_62 does seem to work properly with -Duselargefiles on FreeBSD-current. Yay. -- Christopher Masto Senior Network Monkey NetMonger Communications chris@​netmonger.net info@​netmonger.net http​://www.netmonger.net

Free yourself\, free your machine\, free the daemon -- http​://www.freebsd.org/

p5pRT commented 24 years ago

From @jhi

Christopher Masto writes​:

Just for the record\, 5.005_62 does seem to work properly with -Duselargefiles on FreeBSD-current. Yay.

Yay\, indeed. So I have made some progress. Thanks for testing.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote

The next major release of Perl\, called 5.6\, will be able to handle large files (>2GB) (and >4GB). Currently it seems that one must Configure and build perl separately for that\, though--largefile awareness won't be on default because of backward compatibility.

Can you expand on this? What's the compatibility problem? Isn't large file support a Good Thing\, for those platforms that support it?

Or is it just a question of untested / experimental code? In which case\, why not turn it on by default in _6x\, so it gets tested? And take the decision on stability just before the 5.6 release.

Mike Guy

p5pRT commented 24 years ago

From @jhi

M.J.T. Guy writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi wrote

The next major release of Perl\, called 5.6\, will be able to handle large files (>2GB) (and >4GB). Currently it seems that one must Configure and build perl separately for that\, though--largefile awareness won't be on default because of backward compatibility.

Can you expand on this? What's the compatibility problem?

Wanting large files means that you also want 64 bits. Quadness can mean nasty incompatibility problems​: there are pure 32 bit applications\, 32/64 applications\, and 64 bit applications\, which may or may not be binary compatible. Depending on which libraries (which libc using which bitness) you linked with you may have limitations on which other programs/libraries you can use.

Isn't large file support a Good Thing\, for those platforms that support it?

Yes\, it is.

Or is it just a question of untested / experimental code? In which case\, why not turn it on by default in _6x\, so it gets tested?

That's a possibility.

And take the decision on stability just before the 5.6 release.

Mike Guy

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @AlanBurlison

Jarkko Hietaniemi wrote​:

Wanting large files means that you also want 64 bits. Quadness can mean nasty incompatibility problems​: there are pure 32 bit applications\, 32/64 applications\, and 64 bit applications\, which may or may not be binary compatible. Depending on which libraries (which libc using which bitness) you linked with you may have limitations on which other programs/libraries you can use.

I'm not sure that is entirely correct. On Solaris it's perfectly acceptable to have a 32 bit application that supports large files (64 bit file offsets) - the two things are not the same. In fact all but a handful of commands on 64 bit Solaris are 32 bit applications. There's a good reason for this - 64 bit apps take a performance hit\, and unless you really need a 64 bit address space 32 bit apps are a better choice. I think it will be will be important in the next version of perl to distinguish between 64 bit integer (largefile) support and 64 bit address space (pointer) support. For my particular environment I'd want 64 bit integers (large files) but can't really see that I'd ever want a 64-bit address space.

Alan Burlison

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

I'm not sure that is entirely correct. On Solaris it's perfectly acceptable to have a 32 bit application that supports large files (64 bit file offsets) - the two things are not the same.

What is a "32 bit application"? Do you mean a program compiled where all these are true​:

  sizeof(int) == 4   sizeof(int *) == 4   sizeof(off_t) == 4

And what's a "64 bit application"? One in which all of those is 8?

Or is there some compiler or O/S flag that makes them different? Is it a program linked with a different library\, or an a.out with a different magic number?

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

It's so easy to make

  sizeof(int) ==   sizeof(long) ==   sizeof(int *) ==   sizeof(char *) ==   sizeof(off_t) == 4

But "All the World's a VAX" really has to go away -- some day. :-)

--tom

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

For my particular environment I'd want 64 bit integers (large files) but can't really see that I'd ever want a 64-bit address space.

Me too. In fact\, in PowerMAX OS\, that is the only combo you can get. We support "long long" for 64 bit integers\, and you can use them in 64 bit file operations\, but there is no 64 bit address space and no 64 bit pointers (not right now\, anyway). -- Tom.Horsley@​mail.ccur.com \\\\ Will no one rid me of Concurrent Computers\, Ft. Lauderdale\, FL \\\\ this troublesome Me​: http​://home.att.net/~Tom.Horsley/ \\\\ autoconf? Project Vote Smart​: http​://www.vote-smart.org \\\\ !!!!!

p5pRT commented 24 years ago

From @jhi

Alan Burlison writes​:

Jarkko Hietaniemi wrote​:

Wanting large files means that you also want 64 bits. Quadness can mean nasty incompatibility problems​: there are pure 32 bit applications\, 32/64 applications\, and 64 bit applications\, which may or may not be binary compatible. Depending on which libraries (which libc using which bitness) you linked with you may have limitations on which other programs/libraries you can use.

I'm not sure that is entirely correct. On Solaris it's perfectly acceptable to have a 32 bit application that supports large files (64 bit file offsets) - the two things are not the same. In fact all but a handful of commands on 64 bit Solaris are 32 bit applications. There's a good reason for this - 64 bit apps take a performance hit\, and unless you really need a 64 bit address space 32 bit apps are a better choice. I think it will be will be important in the next version of perl to distinguish between 64 bit integer (largefile) support and 64 bit address space (pointer) support. For my particular environment I'd want 64 bit integers (large files) but can't really see that I'd ever want a 64-bit address space.

What -Duselargefiles now does is that it implicitly switches on -Duse64bits\, too\, which in turn means that IVs and UVs become 64 bits wide. If that turns on also 64 bit pointers\, well\, that's not *my* fault :-)

Alan Burlison

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 12​:14 PM 11/9/99 -0700\, Tom Christiansen wrote​:

I'm not sure that is entirely correct. On Solaris it's perfectly acceptable to have a 32 bit application that supports large files (64 bit file offsets) - the two things are not the same.

What is a "32 bit application"? Do you mean a program compiled where all these are true​:

sizeof(int) == 4 sizeof(int *) == 4 sizeof(off_t) == 4

Probably.

And what's a "64 bit application"? One in which all of those is 8?

Depends. If any one of the above is true the marketroids call it 64-bit. In our case it's sizeof(int)\, since you can get 64-bit integers on 32-bit machines if gcc supports it.

As Alan's pointed out\, 64-bit ints are more useful\, generally speaking\, than 64-bit pointers\, though at some point someone'll probably complain about perl breaking when it tries to allocate more than 4G of memory...

Or is there some compiler or O/S flag that makes them different? Is it a program linked with a different library\, or an a.out with a different magic number?

Depends on the platform. Some\, like the alphas\, go full-blown 64-bit\, both pointers and integers. Intel machines\, OTOH\, get 64-bit integers but 32-bit pointers and gcc does some magic.

The big problem from the perl level is programs with use integer in effect that count on particular overflow effects\, or ones that assume left shifts fall off the end of the world at bit 31. (Arguably broken\, but that argument probably won't buy us much with the folks whose code breaks)

But "All the World's a VAX" really has to go away -- some day. :-)

And it has. Well\, at least for us VMS folks... :-)

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From @jhi

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, those are the datatypes that must accommodate both the 4-byte int and the 32+ bit off_t.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @jhi

The big problem from the perl level is programs with use integer in effect that count on particular overflow effects\, or ones that assume left shifts fall off the end of the world at bit 31. (Arguably broken\, but that argument probably won't buy us much with the folks whose code breaks)

And just don't get me started on the incestuous NV \<=> IV stuffing :-) (a.k.a. op/misc #4)

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @jhi

I'm not sure that is entirely correct. On Solaris it's perfectly acceptable to have a 32 bit application that supports large files (64 bit file offsets) - the two things are not the same. In fact all but a handful of commands on 64 bit Solaris are 32 bit applications. There's

The 64 bit box I am mainly worried about is IRIX\, they have all the three combinations I described. I'll ping Scott Henry about this\, how serious would the incompatibilities be\, if any.

For example DEC^WDigital^WTru64 has no problems because everything is 64 bits (well\, IVs and off_ts\, has always been)\, pointers too\, nominally (physically they are 43 bits\, IIRC).

How about the places where gcc emulates long longs with software\, like ix86? Will turning on 64 bitness be a performance hit?

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\,   IV\,   UV\, +   OV => off_t

There would be some merit in having 'long long' "available" but not the default - at least on the large number of existing 32-bit machines having IV = long long would be a performance hit\, but moving to 64-bit when required would be useful. Probably more pain than it merits though ...

those are the datatypes that must accommodate both the 4-byte int and the 32+ bit off_t. -- Nick Ing-Simmons

p5pRT commented 24 years ago

From @jhi

Nick Ing-Simmons writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, IV\, UV\, + OV => off_t

There would be some merit in having 'long long' "available" but not the default -

You mean like\, say\, "use quad;"? Yuckety. I'm not saying that would necessarily be the interface\, mind. I thought Perl would be above the mess C's non-standardized integer sizes have driven us (witness the "long long" itself...)

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 10​:06 PM 11/9/99 +0200\, Jarkko Hietaniemi wrote​:

For example DEC^WDigital^WTru64 has no problems because everything is 64 bits (well\, IVs and off_ts\, has always been)\, pointers too\, nominally (physically they are 43 bits\, IIRC).

Well\, you get 43 pins out of most Alpha CPUs\, but the pointers are a full 64 bits. A quick calc puts that at around 170 cubic feet of RAM\, so it's not *too* likely that anyone'd hit that limit for a while...

How about the places where gcc emulates long longs with software\, like ix86? Will turning on 64 bitness be a performance hit?

It'd pretty much have to\, I expect. The math'd all be done in software instead of in hardware. That'll probably hurt a bunch.

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 10​:31 PM 11/9/99 +0200\, Jarkko Hietaniemi wrote​:

I thought Perl would be above the mess C's non-standardized integer sizes have driven us (witness the "long long" itself...)

Now\, now\, they are standardized. An int's guaranteed to be at least as big as a char\, and no bigger than a long. The standard's not necessarily *useful*\, mind\, but that's a separate problem...

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

From​: Jarkko Hietaniemi \jhi@&#8203;iki\.fi Nick Ing-Simmons writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, IV\, UV\, + OV => off_t

There would be some merit in having 'long long' "available" but not the default -

You mean like\, say\, "use quad;"? Yuckety. I'm not saying that would necessarily be the interface\, mind. I thought Perl would be above the mess C's non-standardized integer sizes have driven us (witness the "long long" itself...)

While 64-bit on demand (or perhaps even autoresizing (SV_tBIGINT\, anyone?) needed to prevent overflow) would be way cool\, there are a few problems. Like\, for instance\, how do you print a gcc-ish "long long" on a system with no native 64-bit support (ie\, no %q or %Ld or whatever)?

-- BKS

______________________________________________________ Get Your Private\, Free Email at http​://www.hotmail.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

"Benjamin Stuhl" \sho\_pi@&#8203;hotmail\.com writes​:

While 64-bit on demand (or perhaps even autoresizing (SV_tBIGINT\, anyone?) needed to prevent overflow) would be way cool\, there are a few problems. Like\, for instance\, how do you print a gcc-ish "long long" on a system with no native 64-bit support (ie\, no %q or %Ld or whatever)?

The autopromote could promote to arbitrary precision arithmetic when necessary. On capable systems\, it could use longlong for bits greater than 32 and less than 65\, then arbitrary precision higher. Or it could jump straight to arbitrary precision at 33+ bits for systems that are 32bit only.

Math​::BigInt isn't up to the task since it's written in Perl only. We'd need a fast set of C routines. I did a quick port of GNU's GMP library as Math​::GMP but I suspect the Artistic License and LGPL might conflict if we had to distribute GMP with Perl (not to mention size issues and other messiness).

Perhaps we could implement our own bigints in C and allow auto- promotion? There would be performance hits but perhaps it's arguable that end users not worrying about integer size is worth the performance hit?

Chip

-- Chip Turner chip@​ZFx.com   Programmer\, ZFx\, Inc. www.zfx.com   PGP key available at wwwkeys.us.pgp.net

p5pRT commented 24 years ago

From @jhi

Just to clarify my position​: I've nothing against large files. I've nothing against 64 bits. In fact\, I *looooooove* 64 bits. Why do you think I would've been hacking for 64 bit support for the last months if I absolutely hated 64 bits? :-) I'm just being conservative. What goes BANG! if we turn on 64-bitness? What doesn't go bang but gets slower? If the collective we think that the benefits outweight the costs\, we can turn on largefileness and 64-bitness\, great\, finally!

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @jhi

Math​::BigInt isn't up to the task since it's written in Perl only. We'd need a fast set of C routines. I did a quick port of GNU's GMP library as Math​::GMP but I suspect the Artistic License and LGPL might conflict if we had to distribute GMP with Perl (not to mention

Correct.

size issues and other messiness).

That\, too.

Perhaps we could implement our own bigints in C and allow auto- promotion? There would be performance hits but perhaps it's arguable

Something like that is in the (very) long term plan\, but I think restructuring of that magnitude for 5.6 isn't realistic.

that end users not worrying about integer size is worth the performance hit?

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 03​:48 PM 11/9/99 -0500\, Chip Turner wrote​:

Perhaps we could implement our own bigints in C and allow auto- promotion? There would be performance hits but perhaps it's arguable that end users not worrying about integer size is worth the performance hit?

Autopromotion does hurt\, though. You need to check every math operation to see if you overflowed (or will overflow) and need to promote. Not necessarily a bad thing in general\, but it does cost.

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Autopromotion does hurt\, though. You need to check every math operation to see if you overflowed (or will overflow) and need to promote. Not necessarily a bad thing in general\, but it does cost.

We already have lexical "use integer". Perhaps we could have a lexical "use bigint"\, assuming that we could get an implementation that were fast enough.

--tom

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 02​:04 PM 11/9/99 -0700\, Tom Christiansen wrote​:

Autopromotion does hurt\, though. You need to check every math operation to see if you overflowed (or will overflow) and need to promote. Not necessarily a bad thing in general\, but it does cost.

We already have lexical "use integer". Perhaps we could have a lexical "use bigint"\, assuming that we could get an implementation that were fast enough.

I was thinking of that. We could\, I suppose\, have a second set of 'bigint' math opcodes so the normal ones wouldn't be slowed down any\, and add another flag to the SV structure. (What the heck\, we don't have enough in there anyway... :)

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From @gsar

On Tue\, 09 Nov 1999 21​:58​:51 +0200\, Jarkko Hietaniemi wrote​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, those are the datatypes that must accommodate both the 4-byte int and the 32+ bit off_t.

Umm\, according to my records\, the following change (in 5.005_57) added large file support for Solaris under the 32-bit universe​:

  [ 3311] By​: gsar on 1999/05/06 05​:37​:55   Log​: From​: Damon Atkins \n107844@&#8203;sysmgtdev\.nabaus\.com\.au   Date​: Tue\, 30 Mar 1999 11​:26​:11 +1000 (EST)   Message-Id​: \199903300126\.LAA20870@&#8203;sysmgtdev\.nabaus\.com\.au   Subject​: Largefiles for Solaris   Branch​: perl   ! hints/solaris_2.sh

I think it makes tons of sense to keep that support if people are not explicitly asking for 64-bit everything.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From @gsar

On Tue\, 09 Nov 1999 22​:45​:45 +0200\, Jarkko Hietaniemi wrote​:

Just to clarify my position​: I've nothing against large files. I've nothing against 64 bits. In fact\, I *looooooove* 64 bits. Why do you think I would've been hacking for 64 bit support for the last months if I absolutely hated 64 bits? :-) I'm just being conservative. What goes BANG! if we turn on 64-bitness? What doesn't go bang but gets slower? If the collective we think that the benefits outweight the costs\, we can turn on largefileness and 64-bitness\, great\, finally!

We really can't decide until we have some numbers on how good/bad it gets\, but my gut feeling is that we would enable largefiles-without-64-bits by default where that's possible (as was the case on Solaris in 5.005_57).

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Dan Sugalski writes​:

How about the places where gcc emulates long longs with software\, like ix86? Will turning on 64 bitness be a performance hit?

It'd pretty much have to\, I expect. The math'd all be done in software instead of in hardware. That'll probably hurt a bunch.

Anyone ready for a bench?

Ilya

p5pRT commented 24 years ago

From @jhi

Gurusamy Sarathy writes​:

On Tue\, 09 Nov 1999 21​:58​:51 +0200\, Jarkko Hietaniemi wrote​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, those are the datatypes that must accommodate both the 4-byte int and the 32+ bit off_t.

Umm\, according to my records\, the following change (in 5.005_57) added large file support for Solaris under the 32-bit universe​:

\[  3311\] By&#8203;: gsar                                  on 1999/05/06  05&#8203;:37&#8203;:55
    Log&#8203;: From&#8203;: Damon Atkins \<n107844@&#8203;sysmgtdev\.nabaus\.com\.au>
     Date&#8203;: Tue\, 30 Mar 1999 11&#8203;:26&#8203;:11 \+1000 \(EST\)
     Message\-Id&#8203;: \<199903300126\.LAA20870@&#8203;sysmgtdev\.nabaus\.com\.au>
     Subject&#8203;: Largefiles for Solaris
 Branch&#8203;: perl
       \! hints/solaris\_2\.sh

I think it makes tons of sense to keep that support if people are not explicitly asking for 64-bit everything.

According to my records\, the most important of which is the current Solaris hints file\, that change was later removed as I rewrote large parts of the 64-bit support\, including the hints. I guess I'm just too paranoid​: I'm afraid the doubles won't preserve all the bits of file offsets.

That patch *always* turned on largefileness if available. If that's what we want\, okay\, let's.

Sarathy gsar@​ActiveState.com

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 04​:48 PM 11/9/99 -0500\, Ilya Zakharevich wrote​:

Dan Sugalski writes​:

How about the places where gcc emulates long longs with software\, like ix86? Will turning on 64 bitness be a performance hit?

It'd pretty much have to\, I expect. The math'd all be done in software instead of in hardware. That'll probably hurt a bunch.

Anyone ready for a bench?

The results would certainly be interesting. Anyone know what version of gcc for Intel Linux brought out the 64-bit integer stuff? (I did one for _61 or _60\, IIRC. Whichever had the thread-speedup patch. But it was for VMS on Alphas\, and we have native 64-bit integers\, so the numbers aren't really applicable here)

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Nick Ing-Simmons writes​:

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

As you essentially pointed out\, one should be able to have a 4-byte int and a larger off_t without conflict.

In C\, yes. In Perl\, we have IVs and UVs\, tucked into SVs\, and unless we do some serious magic\, IV\, UV\, + OV => off_t

There would be some merit in having 'long long' "available" but not the default - at least on the large number of existing 32-bit machines having IV = long long would be a performance hit\, but moving to 64-bit when required would be useful. Probably more pain than it merits though ...

Autoupgrade to bigint is more merit - but will slow things down too. However\, for off_t one can use NV without any practical problem.

Ilya

p5pRT commented 24 years ago

From @AlanBurlison

Jarkko Hietaniemi wrote​:

What -Duselargefiles now does is that it implicitly switches on -Duse64bits\, too\, which in turn means that IVs and UVs become 64 bits wide. If that turns on also 64 bit pointers\, well\, that's not *my* fault :-)

No\, it doesn't turn on 64-bit pointers - in fact that's the whole point I was trying to make. To me '64-bit app' means 64 bit address space\, *not* 64-bit integer or largefile. It's important to make the distinction between 32 bit app with 64 bit file offset and/or integer support and a 64 bit app with a 64 bit address space.

Alan Burlison

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Dan Sugalski writes​:

As Alan's pointed out\, 64-bit ints are more useful\, generally speaking\, than 64-bit pointers\, though at some point someone'll probably complain about perl breaking when it tries to allocate more than 4G of memory...

BTW\, With Perl's malloc the limit per one allocation is 2G (to detect negative malloc()s).

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 05​:03 PM 11/9/99 -0500\, Ilya Zakharevich wrote​:

Dan Sugalski writes​:

As Alan's pointed out\, 64-bit ints are more useful\, generally speaking\, than 64-bit pointers\, though at some point someone'll probably complain about perl breaking when it tries to allocate more than 4G of memory...

BTW\, With Perl's malloc the limit per one allocation is 2G (to detect negative malloc()s).

Sure. But I was thinking of the case where the 4G limit gets hit piecemeal--an array of 100M scalars\, each 80 characters long\, would do it.

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From @gsar

On Tue\, 09 Nov 1999 23​:52​:21 +0200\, Jarkko Hietaniemi wrote​:

According to my records\, the most important of which is the current Solaris hints file\, that change was later removed as I rewrote large parts of the 64-bit support\, including the hints. I guess I'm just too paranoid​: I'm afraid the doubles won't preserve all the bits of file offsets.

Most people don't have 525 terabyte disks yet. :-)

That patch *always* turned on largefileness if available. If that's what we want\, okay\, let's.

Yup\, I suspect that's what we want.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

According to my records\, the most important of which is the current Solaris hints file\, that change was later removed as I rewrote large parts of the 64-bit support\, including the hints. I guess I'm just too paranoid​: I'm afraid the doubles won't preserve all the bits of file offsets.

Most people don't have 525 terabyte disks yet. :-)

Perhaps not\, but given files with holes in them\, I bet you could still have valid seek pointers outside the place where NVs start losing precision.

--tom

p5pRT commented 24 years ago

From @gsar

On Tue\, 09 Nov 1999 15​:07​:03 MST\, Tom Christiansen wrote​:

Most people don't have 525 terabyte disks yet. :-)

Perhaps not\, but given files with holes in them\, I bet you could still have valid seek pointers outside the place where NVs start losing precision.

Given a hard limit of 2GB and losing precision at 524TB\, no points for guessing which one is more practical. Besides\, we can make -w come in handy here.

Sarathy gsar@​ActiveState.com

p5pRT commented 24 years ago

From @jhi

Gurusamy Sarathy writes​:

On Tue\, 09 Nov 1999 15​:07​:03 MST\, Tom Christiansen wrote​:

Most people don't have 525 terabyte disks yet. :-)

Perhaps not\, but given files with holes in them\, I bet you could still have valid seek pointers outside the place where NVs start losing precision.

Given a hard limit of 2GB and losing precision at 524TB\, no points for guessing which one is more practical. Besides\, we can make -w come in handy here.

"Cap'n! The bits...canna hold 'em!"

Sarathy gsar@​ActiveState.com

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @AlanBurlison

Tom Christiansen wrote​:

Firstly the full story for 32 vs. 64 bit apps on Solaris can be found at http​://docs.sun.com​:80/ab2/coll.45.10/SOL64TRANS\, and the page at http​://docs.sun.com​:80/ab2/coll.45.10/SOL64TRANS/@​Ab2PageView/487 is especially pertinent to this discussion.

For the impatient\, I'll summarise briefly by replying to Tom's questions below :-)

What is a "32 bit application"? Do you mean a program compiled where all these are true​:

sizeof\(int\)   == 4
sizeof\(int \*\) == 4
sizeof\(off\_t\) == 4

This model is known as "ILP32"\, i.e. Ints\, Longs and Pointers are all 32 bits long. Long longs are however 64 bits\, and can therefore be used as the underlying type of off_t for 64 bit file access. This is the only model supported prior to Solaris 7.

And what's a "64 bit application"? One in which all of those is 8?

That would be LP64\, sir. Ints are still 32 bits\, but Longs and Pointers are 64 bit. Long longs are still 64 bit. This *and* the previous ILP32 model are supported simultaneously by Solaris 7 onwards\, when running the OS in 64 bit mode.

Or is there some compiler or O/S flag that makes them different? Is it a program linked with a different library\, or an a.out with a different magic number?

Egsakerly. To get a 64-bit app you need to add the '-xarch=v9' flag during compilation. That produces an ELF64 executable\, rather than the default ELF32 executable. Here's a simple example​:

fubar$ cat z.c #include \<stdio.h> int main() { printf("my ints are %d bits\n"\, sizeof(int) * 8); printf("my longs are %d bits\n"\, sizeof(long) * 8); printf("my long longs are %d bits\n"\, sizeof(long long) * 8); printf("my pointers are %d bits\n"\, sizeof(char*) * 8); } fubar$ cc -o z z.c fubar$ file z z​: ELF 32-bit MSB executable SPARC Version 1\, dynamically linked\, n ot stripped fubar$ ldd z   libc.so.1 => /usr/lib/libc.so.1   libdl.so.1 => /usr/lib/libdl.so.1 fubar$ z my ints are 32 bits my longs are 32 bits my long longs are 64 bits my pointers are 32 bits

fubar$ cc -o z z.c -xarch=v9 fubar$ file z z​: ELF 64-bit MSB executable SPARCV9 Version 1\, dynamically linked\, not stripped fubar$ ldd z   libc.so.1 => /usr/lib/sparcv9/libc.so.1   libdl.so.1 => /usr/lib/sparcv9/libdl.so.1 fubar$ z my ints are 32 bits my longs are 64 bits my long longs are 64 bits my pointers are 64 bits

Hope that clears up the confusion a bit - 64 bit *file* access doesn't make something into a 64 bit *application*. "64 bit app" means 64 bit address space\, nothing more\, nothing less.

Alan Burlison

p5pRT commented 24 years ago

From @doughera88

On Tue\, 9 Nov 1999\, Jarkko Hietaniemi wrote​:

That patch *always* turned on largefileness if available. If that's what we want\, okay\, let's.

You also don't always have a choice. In various places\, Perl5.00x_xx uses off_t\, which already may be typedef'd by some system header to be "long long"\, and which may already by 64 bits. All this can (and has) happened with or without an explicit -Duse64bits. (I vaguely recall one of the *BSD ports does this\, but I'm not sure.)

As long as the info within perl stays in an off_t variable\, things work fine. If you try to print it\, or if you do some operation that causes perl to try to stuff it into an SV somehow\, it might or might not work.
I recall receiving mixed reports about how well this worked (or didn't work).

Enabling full internal 64-bit support within perl is likely to make more of this work better\, though we still have the IV \<-> NV problem.

In short\, I suspect that enabling a "LARGEFILES" option that causes system functions to return 64-bit off_t's will occasionally work on actual large files\, but will also often fail\, unless perl itself is prepared to deal with 64-bit integers. That is\, I suspect Jarkko's right​: turning on largefiles without -Duse64bits may well be turning on something that is often going to fail. That's probably not a good idea. [I don't dispute that in C on some platforms this _can_ all work smoothly. But perl\, with it's one integral type\, the IV\, is not C.]

I could test all of this\, by the way\, if someone wanted to give me a nice big empty disk to put some large files one :-).

--   Andy Dougherty doughera@​lafayette.edu   Dept. of Physics   Lafayette College\, Easton PA 18042

p5pRT commented 24 years ago

From @AlanBurlison

Dan Sugalski wrote​:

Depends on the platform. Some\, like the alphas\, go full-blown 64-bit\, both pointers and integers. Intel machines\, OTOH\, get 64-bit integers but 32-bit pointers and gcc does some magic.

Using the commonly accepted nomenclature\, Alpha is therefore ILP64\, i.e. Ints\, Longs and Pointers are all 64 bit. Solaris is LP64\, i.e. 32 bit Ints but 64 bit Longs and Pointers. X/Open have written a standards document explaining the various permutations and why they think the industry choice should be LP64. This doc claims to have the by-in of DEC\, HP\, IBM\, Intel\, Novell\, NCR\, SCO and Sun. You can see it at http​://www.opengroup.org/public/tech/aspen/lp64_wp.htm

Alan Burlison

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

You also don't always have a choice. In various places\, Perl5.00x_xx uses off_t\, which already may be typedef'd by some system header to be "long long"\, and which may already by 64 bits. All this can (and has) happened with or without an explicit -Duse64bits. (I vaguely recall one of the *BSD ports does this\, but I'm not sure.)

Here's a piece of \<sys/stat.h>​:

/* $OpenBSD​: stat.h\,v 1.7 1998/02/16 21​:56​:25 millert Exp $ */ /* $NetBSD​: stat.h\,v 1.20 1996/05/16 22​:17​:49 cgd Exp $ */

struct stat {   dev_t st_dev; /* inode's device */   ino_t st_ino; /* inode's number */   mode_t st_mode; /* inode protection mode */   nlink_t st_nlink; /* number of hard links */   uid_t st_uid; /* user ID of the file's owner */   gid_t st_gid; /* group ID of the file's group */   dev_t st_rdev; /* device type */ #ifndef _POSIX_SOURCE   struct timespec st_atimespec; /* time of last access */   struct timespec st_mtimespec; /* time of last data modification */   struct timespec st_ctimespec; /* time of last file status change */ #else   time_t st_atime; /* time of last access */   long st_atimensec; /* nsec of last access */   time_t st_mtime; /* time of last data modification */   long st_mtimensec; /* nsec of last data modification */   time_t st_ctime; /* time of last file status change */   long st_ctimensec; /* nsec of last file status change */ #endif   off_t st_size; /* file size\, in bytes */   int64_t st_blocks; /* blocks allocated for file */   u_int32_t st_blksize; /* optimal blocksize for I/O */   u_int32_t st_flags; /* user defined flags for file */   u_int32_t st_gen; /* file generation number */   int32_t st_lspare;   int64_t st_qspare[2]; }; #ifndef _POSIX_SOURCE #define st_atime st_atimespec.tv_sec #define st_atimensec st_atimespec.tv_nsec #define st_mtime st_mtimespec.tv_sec #define st_mtimensec st_mtimespec.tv_nsec #define st_ctime st_ctimespec.tv_sec #define st_ctimensec st_ctimespec.tv_nsec #endif

And from \<sys/types.h> we have this​:

typedef quad_t off_t; /* file offset */ /* * These belong in unistd.h\, but are placed here too to ensure that * long arguments will be promoted to off_t if the program fails to * include that header or explicitly cast them to off_t. */ off_t lseek __P((int\, off_t\, int)); int ftruncate __P((int\, off_t)); int truncate __P((const char *\, off_t)); jhereg(tchrist)% rsh thon grep quad /usr/include/sys/types.h typedef u_int64_t u_quad_t; /* quads */ typedef int64_t quad_t; typedef quad_t * qaddr_t; typedef quad_t off_t; /* file offset */ typedef quad_t rlim_t; /* resource limit */

--tom

p5pRT commented 24 years ago

From @jhi

Alan Burlison writes​:

Dan Sugalski wrote​:

Depends on the platform. Some\, like the alphas\, go full-blown 64-bit\, both pointers and integers. Intel machines\, OTOH\, get 64-bit integers but 32-bit pointers and gcc does some magic.

Using the commonly accepted nomenclature\, Alpha is therefore ILP64\, i.e.

Nah. At least Tru64 is I32LP64.

Ints\, Longs and Pointers are all 64 bit. Solaris is LP64\, i.e. 32 bit Ints but 64 bit Longs and Pointers. X/Open have written a standards document explaining the various permutations and why they think the industry choice should be LP64. This doc claims to have the by-in of DEC\, HP\, IBM\, Intel\, Novell\, NCR\, SCO and Sun. You can see it at http​://www.opengroup.org/public/tech/aspen/lp64_wp.htm

Alan Burlison

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

At 12​:37 AM 11/10/99 +0200\, Jarkko Hietaniemi wrote​:

Alan Burlison writes​:

Dan Sugalski wrote​:

Depends on the platform. Some\, like the alphas\, go full-blown 64-bit\, both pointers and integers. Intel machines\, OTOH\, get 64-bit integers but 32-bit pointers and gcc does some magic.

Using the commonly accepted nomenclature\, Alpha is therefore ILP64\, i.e.

Nah. At least Tru64 is I32LP64.

And C under VMS by default is ILP32\, but you can tell it you really want ILP64 and that's OK.

Ain't 64-bitness fun?

  Dan

----------------------------------------"it's like this"------------------- Dan Sugalski even samurai dan@​sidhe.org have teddy bears and even   teddy bears get drunk

p5pRT commented 24 years ago

From @jhi

And C under VMS by default is ILP32\, but you can tell it you really want ILP64 and that's OK.

And in Tru64 C you can force addresses always to be in the low 31 bits though the pointers still are 64 bits wide...

Ain't 64-bitness fun?

I'm ecstatic.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Ain't 64-bitness fun?

I remember when we made the transition on BSD from PDP-11s to VAXen. This was actually less painful\, because in the world of the elevenses\, people were careful about their "%hd" and such. But now that we've gotten spoiled by a common world where sizewise\, int==long and char*==int*\, people have become sloppy. And I still think lseek was unfortunate. :-)

--tom

p5pRT commented 24 years ago

From @AlanBurlison

Dan Sugalski wrote​:

Depends on the platform. Some\, like the alphas\, go full-blown 64-bit\, both pointers and integers. Intel machines\, OTOH\, get 64-bit integers but 32-bit pointers and gcc does some magic.

Actually I've just found two papers published by DEC that say that the Alpha AXP C compilers for 64-bit Digital Unix use the LP64 model\, not ILP64\, so ints on 64-bit Alpha Unix should be 32 bits long\, not 64 - are you sure ints on Alpha are really 64 bit?

Alan Burlison

p5pRT commented 24 years ago

From @jhi

Tom Christiansen writes​:

Ain't 64-bitness fun?

I remember when we made the transition on BSD from PDP-11s to VAXen. This was actually less painful\, because in the world of the elevenses\, people were careful about their "%hd" and such. But now that we've gotten spoiled by a common world where sizewise\, int==long and char*==int*\, people have become sloppy. And I still think lseek was unfortunate. :-)

As a matter of fact\, I have spent lately *many* hours squashing the explict "%ld"​:s from Perl's code...

--tom

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From @AlanBurlison

Gurusamy Sarathy wrote​:

Most people don't have 525 terabyte disks yet. :-)

They might have sparse files that big though... And with a combination like Veritas Volume Manager + Oracle 8 it is now technically feasible to have a single raw datafile approaching that sort of size.

Alan Burlison

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

As a matter of fact\, I have spent lately *many* hours squashing the explict "%ld"​:s from Perl's code...

Hm... where? Shouldn't you use %ld for a long argument?

--tom