Perl / perl5

šŸŖ The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Perl5 doesn't distinguish "originally was a string" and "originally was a number" and doesn't let one control caching of stringification #12801

Open p5pRT opened 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#116871 (status was 'open')

Searchable as RT116871$

p5pRT commented 11 years ago

From @demerphq

I am not including a Perl version data in this ticket as it applies to all Perls that currently exist as of 5.17.9.

Perl doesn't preserve the original type of a scalar var\, so one has difficulty reliably telling these apart​:

$x= "1"; $y= 1;

This causes problems in serialization for variables that have been used in both numeric and string context. And if improperly handled can lead to data loss.

We should remember the original type. I believe Chip in the past did some work on this subject.

Related to this we do not give the user any ability to control the internal caching of the stringified form. This can lead to surprising situations like memory overflow from printing out a data structure. I have been forced in the past to rely on code like this​:

# dont let Perl chew up all our ram caching stuff we will only use once... my @​copy= @​{$hash->{$thing}}; print join("\,"\, @​copy)\,"\n";

I think there should be an easier way to control this behavior\, and that perl should be more careful about tracking it.

Cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From perl@profvince.com

Perl doesn't preserve the original type of a scalar var\, so one has difficulty reliably telling these apart​:

$x= "1"; $y= 1;

This causes problems in serialization for variables that have been used in both numeric and string context. And if improperly handled can lead to data loss.

We should remember the original type. I believe Chip in the past did some work on this subject.

My negative opinion on this hasn't changed since http​://www.nntp.perl.org/group/perl.perl5.porters/2012/08/msg190382.html.

Vincent

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @demerphq

On 20 February 2013 03​:29\, Vincent Pit \perl@​profvince\.com wrote​:

Perl doesn't preserve the original type of a scalar var\, so one has difficulty reliably telling these apart​:

$x= "1"; $y= 1;

This causes problems in serialization for variables that have been used in both numeric and string context. And if improperly handled can lead to data loss.

We should remember the original type. I believe Chip in the past did some work on this subject.

My negative opinion on this hasn't changed since http​://www.nntp.perl.org/group/perl.perl5.porters/2012/08/msg190382.html.

With all due respect your objections rely on something that is completely unpractical\, and IMO part of it is categorically wrong. For instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would Just Work\, and there would be no need for externally tagging. Likewise if Perl remember that a SvPVIV was originally an IV then this would also Just Work.

IMO this is doable if we want to do it.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @chipdude

On 2/19/2013 6​:38 PM\, demerphq wrote​:

If perl remembered that an SvPVIV was originally a PV then this would Just Work\, and there would be no need for externally tagging. Likewise if Perl remember that a SvPVIV was originally an IV then this would also Just Work.

Indeed. But I'm not interesting it doing it right now. The effort required is about equal to creating a pure C++ parser as part of a Perl reimplementation. Obviously the latter is more worthwhile.

p5pRT commented 11 years ago

From perl@profvince.com

With all due respect your objections rely on something that is completely unpractical\, and IMO part of it is categorically wrong. For instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would Just Work\, and there would be no need for externally tagging. Likewise if Perl remember that a SvPVIV was originally an IV then this would also Just Work.

I'm not sure how you can consider practical to tell people that 1 and '1' may be used the same way on all mathematical operators (except for the rarely used and badly designed bitwise operators)\, but that they are in fact different\, and that it can bite them hard if pass the wrong one to lib X that makes use of that difference. Good luck debugging this through several layers of code. I don't think that's helping anyone\, but hey\, what do I know? After all I'm wrong and stuff :)

IMO this is doable if we want to do it.

We can do a lot of things. The question is whether we should do them.

Vincent

p5pRT commented 11 years ago

From @demerphq

On 20 February 2013 04​:05\, Vincent Pit \perl@​profvince\.com wrote​:

With all due respect your objections rely on something that is completely unpractical\, and IMO part of it is categorically wrong. For instance this statement​:

"there's no safe way to do this besides stating explicitely in which kind the value shall be serialized"

If perl remembered that an SvPVIV was originally a PV then this would Just Work\, and there would be no need for externally tagging. Likewise if Perl remember that a SvPVIV was originally an IV then this would also Just Work.

I'm not sure how you can consider practical to tell people that 1 and '1' may be used the same way on all mathematical operators (except for the rarely used and badly designed bitwise operators)\, but that they are in fact different\, and that it can bite them hard if pass the wrong one to lib X that makes use of that difference.

Can you give an example where that would happen?

You seem to be worried about a case im having problems understanding.

I am worried about cases like this​:

$x= "0 but true"; print 0+$x;

I want to be able to know that this needs to be serialized as C\<"0 but true"> and not C\<0>.

Similar I want to be able to know that

$x= 1; print $x;

that I can safely serialize $x as an SvIV and not as a SvPVIV\, which would require me to store both the string and integer representation (which is what Storable for instance does.)

So\, what problem are you worried about in practice? I am really struggling with imaging a scenario like you described...

Good luck debugging this through several layers of code. I don't think that's helping anyone\,

Like I said\, can you turn this from a hypothetical to a more substantive example of where knowing the source type would cause a problem?

but hey\, what do I know? After all I'm wrong and stuff :)

Well\, you claimed it can't be safely done\, I don't see how that can be true.

And I said "with all due respect" for a reason​: I wasn't trying to be disrespectful in disagreeing with you.

IMO this is doable if we want to do it.

We can do a lot of things. The question is whether we should do them.

Yeah of course. No argument there.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @ap

* demerphq \demerphq@&#8203;gmail\.com [2013-02-20 04​:20]​:

I am worried about cases like this​:

$x= "0 but true"; print 0+$x;

I want to be able to know that this needs to be serialized as C\<"0 but true"> and not C\<0>.

Right\, the issue isnā€™t treating perfect equivalents like 1 and '1' as different things\, it is that imperfect conversions that lose information can attach non-canonical values to SVs a side effect of other operations and thereafter there is no way to know which value was canonical.

(I have argued before that if anything like this is done\, then any core API that allows distinguishing 1 from '1' in Perl space should have names based on perlguts terminology\, not attractive nuisance names like `is_number` or anything similar\, and the POD should warn against using them that way.)

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @jandubois

On Thu\, Feb 21\, 2013 at 12​:31 PM\, Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de wrote​:

* demerphq \demerphq@&#8203;gmail\.com [2013-02-20 04​:20]​:

I am worried about cases like this​:

$x= "0 but true"; print 0+$x;

I want to be able to know that this needs to be serialized as C\<"0 but true"> and not C\<0>.

Right\, the issue isnā€™t treating perfect equivalents like 1 and '1' as different things\, it is that imperfect conversions that lose information can attach non-canonical values to SVs a side effect of other operations and thereafter there is no way to know which value was canonical.

This is not generally true; "0 but true" is just a special case that does get SVf_IOK set when used as a number. In general\, only the private SVp_IOK or SVp_NOK flags will be set if the conversion is in-exact\, to indicate that the PV is the canonical representation​:

$ perl -MDevel​::Peek -e '$a="0"; $b=$a+1; Dump $a' SV = PVIV(0x100832bf0) at 0x100831cd8   REFCNT = 1   FLAGS = (IOK\,POK\,pIOK\,pPOK)   IV = 0   PV = 0x100205db0 "0"\0   CUR = 1   LEN = 16

$ perl -MDevel​::Peek -e '$a="0 but"; $b=$a+1; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8   REFCNT = 1   FLAGS = (POK\,pIOK\,pNOK\,pPOK)   IV = 0   NV = 0   PV = 0x100205db0 "0 but"\0   CUR = 5   LEN = 16

$ perl -MDevel​::Peek -e '$a="0 but true"; $b=$a+1; Dump $a' SV = PVIV(0x100832bf0) at 0x100831cd8   REFCNT = 1   FLAGS = (IOK\,POK\,pIOK\,pPOK)   IV = 0   PV = 0x100205dc0 "0 but true"\0   CUR = 10   LEN = 16

There are other special cases for "Inf" and "NaN" too\, but it looks like they will only get SVp_IOK set when they start out as strings\, not as numbers​:

$ perl -MDevel​::Peek -e '$a="Inf"; $b=$a+1; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8   REFCNT = 1   FLAGS = (NOK\,POK\,pIOK\,pNOK\,pPOK\,IsUV)   UV = 18446744073709551615   NV = inf   PV = 0x100205db0 "Inf"\0   CUR = 3   LEN = 16

$ perl -MDevel​::Peek -e '$a=2**9999; $b="$a"; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8   REFCNT = 1   FLAGS = (NOK\,POK\,pNOK\,pPOK)   IV = 0   NV = inf   PV = 0x100214f00 "inf"\0   CUR = 3   LEN = 48

I think though that the SVp_IOK difference may be an accidental implementation detail\, so it would need tests to make sure we don't accidentally change it in the future.

Another problem with NaN and Inf is that their string representation is currently platform dependent (inherited from C RTL).

Cheers\, -Jan

p5pRT commented 11 years ago

From @jandubois

On Thu\, Feb 21\, 2013 at 4​:32 PM\, Jan Dubois \jand@&#8203;activestate\.com wrote​:

There are other special cases for "Inf" and "NaN" too\, but it looks like they will only get SVp_IOK set when they start out as strings\, not as numbers​:

$ perl -MDevel​::Peek -e '$a="Inf"; $b=$a+1; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8 REFCNT = 1 FLAGS = (NOK\,POK\,pIOK\,pNOK\,pPOK\,IsUV) UV = 18446744073709551615 NV = inf PV = 0x100205db0 "Inf"\0 CUR = 3 LEN = 16

$ perl -MDevel​::Peek -e '$a=2**9999; $b="$a"; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8 REFCNT = 1 FLAGS = (NOK\,POK\,pNOK\,pPOK) IV = 0 NV = inf PV = 0x100214f00 "inf"\0 CUR = 3 LEN = 48

I think though that the SVp_IOK difference may be an accidental implementation detail\, so it would need tests to make sure we don't accidentally change it in the future.

Never mind\, it doesn't actually work​:

$ perl -MDevel​::Peek -e '$a=2**9999; $b=$a+1; $b="$a"; Dump $a' SV = PVNV(0x100806330) at 0x100831cd8   REFCNT = 1   FLAGS = (NOK\,POK\,pIOK\,pNOK\,pPOK\,IsUV)   UV = 18446744073709551615   NV = inf   PV = 0x100205bc0 "inf"\0   CUR = 3   LEN = 48

So there is no way to tell if they started as an NV or a PV.

Cheers\, -Jan