Sereal / Sereal

Fast, compact, schema-less, binary serialization and deserialization oriented towards dynamic languages
414 stars 96 forks source link

Sereal mishandles $) #263

Open FGasper opened 3 years ago

FGasper commented 3 years ago

CentOS 7:

> perl -MDevel::Peek -MSereal -MData::Dumper -e'$) = "1 1"; print "$)\n"; Dump $(; print Dumper( Sereal::decode_sereal( Sereal::encode_sereal( [$)] ) ) )'
1 1
SV = PVMG(0x1d15500) at 0x1cb3ba0
  REFCNT = 1
  FLAGS = (GMG,SMG)
  IV = 0
  NV = 0
  PV = 0
  MAGIC = 0x1d0bd20
    MG_VIRTUAL = &PL_vtbl_sv
    MG_TYPE = PERL_MAGIC_sv(\0)
    MG_OBJ = 0x1bb3a88
    MG_LEN = 1
    MG_PTR = 0x1cca180 "("
$VAR1 = [
          1
        ];

Note that the decoded Sereal is 1, not '1 1'.

FGasper commented 3 years ago

For comparison, here’s how JSON::XS handles this:

> perl -MDevel::Peek -MJSON::XS -MData::Dumper -e'$) = "1 1"; print "$)\n"; Dump $(; print Dumper( decode_json( encode_json( [$)] ) ) )'
1 1
SV = PVMG(0x15ba520) at 0x1447b70
  REFCNT = 1
  FLAGS = (GMG,SMG)
  IV = 0
  NV = 0
  PV = 0
  MAGIC = 0x15b1ae0
    MG_VIRTUAL = &PL_vtbl_sv
    MG_TYPE = PERL_MAGIC_sv(\0)
    MG_OBJ = 0x148b540
    MG_LEN = 1
    MG_PTR = 0x15b1ac0 "("
$VAR1 = [
          '1 1'
        ];
FGasper commented 3 years ago

This is how Cpanel::JSON::XS avoids this problem:

      if (!force_conversion && SvPOKp (sv) && !strEQ(savecur, SvPVX (sv))) {
        char *str = SvPVX (sv);
        STRLEN len = SvCUR (sv);
        enc->cur = savecur;
        enc->end = saveend;
        encode_ch (aTHX_ enc, '"');
        encode_str (aTHX_ enc, str, len, SvUTF8 (sv));
        encode_ch (aTHX_ enc, '"');
        *enc->cur = 0;
      }

I don’t know if it would be acceptable in terms of performance, but one way to fix this would be to keep a small buffer around to stringify the IV then memEQ that with the PV; if they mismatch, then use the PV.

toddr commented 3 years ago

I'd point out that this SV has magic. Maybe it's not enough to trust the IV only if it has magic?

atoomic commented 3 years ago

Indeed any dual var having different representation in IV and PV space... will have the same issue without serializing both. Giving priority of the PV for magics seems a fair way to work around the issue there

demerphq commented 3 years ago

"but one way to fix this would be to keep a small buffer around to stringify the IV then memEQ that with the PV; if they mismatch, then use the PV."

We actually do this in some cases, but IIRC only when the flags are ambiguous. I guess we missed a case or something. I presume we serialized this as an IV, but I dont know why yet. I will dig.

demerphq commented 3 years ago

BTW, when i run this I get a different result:

$ perl -MDevel::Peek -MSereal -MData::Dumper -e'print $],"\n"; $) = "1 1"; print "\$)=<$)>\n"; Dump $(; print Dumper( Sereal::decode_sereal( Sereal::encode_sereal( [$)] ) ) )' 5.024001 $)=<1000 4 24 27 30 46 115 127 1000> SV = PVMG(0x279ba70) at 0x2729528 REFCNT = 1 FLAGS = (GMG,SMG) IV = 0 NV = 0 PV = 0 MAGIC = 0x26db080 MG_VIRTUAL = &PL_vtbl_sv MG_TYPE = PERL_MAGIC_sv(\0) MG_OBJ = 0x27e5920 MG_LEN = 1 MG_PTR = 0x26db060 "(" $VAR1 = [ 1000 ];

Eg, I think this is at least partly a Perl bug. Still poking to understand more.

Yves

toddr commented 3 years ago

@demerphq see https://github.com/Perl/perl5/issues/18955 and the pull request https://github.com/Perl/perl5/pull/18957 for what we think the fix is.