Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 560 forks source link

perl5: false warning on "Multidimensional syntax not supported" #16535

Closed p5pRT closed 11 months ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#133155 (status was 'open')

Searchable as RT133155$

p5pRT commented 6 years ago

From wolf-dietrich_moeller@t-online.de

Hi\, I found a false warning on "Multidimensional syntax". The test program below is executed correctly\, but line 7 triggers a false warning. This is surprising\, as the other 'print' lines have a similar structure without raising this warning. Why should this be raised here? From the syntax it is clear that both variables are parameters of "index"\, as it has (at least) two arguments.

Best regards Wolf

use warnings;
my @x = (['a','b']);
my @y = ('a','b');
my $z = 'rst';
my $i = 's';
print ' 6: ',$x[0][index $z,$i],"\n";
print ' 7: ',$y[index $z,$i],"\n";
print ' 8: ',$y[index($z,$i)],"\n";
print ' 9: ',$y[index 'rst',$i],"\n";
print '10: ',@y[index $z,$i],"\n";

Output:

Multidimensional syntax $y[index $z,$i] not supported at test_index.pl line
7.
6: b
7: b
8: b
9: b
10: b
Perl Info ``` ----------------------------------------------------------------- --- Flags: category=core severity=low --- Site configuration information for perl 5.26.2: Configured by strawberry-perl at Sun Apr 15 11:47:59 2018. Summary of my perl5 (revision 5 version 26 subversion 2) configuration: Platform: osname=MSWin32 osvers=10.0.16299.371 archname=MSWin32-x86-multi-thread-64int uname='Win32 strawberry-perl 5.26.2.1 #1 Sun Apr 15 11:47:13 2018 i386' config_args='undef' hint=recommended useposix=true d_sigaction=undef useithreads=define usemultiplicity=define use64bitint=define use64bitall=undef uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='gcc' ccflags =' -s -O2 -DWIN32 -D__USE_MINGW_ANSI_STDIO -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -fwrapv -fno-strict-aliasing -mms-bitfields' optimize='-s -O2' cppflags='-DWIN32' ccversion='' gccversion='7.1.0' gccosandvers='' intsize=4 longsize=4 ptrsize=4 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=12 longdblkind=3 ivtype='long long' ivsize=8 nvtype='double' nvsize=8 Off_t='long long' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='g++' ldflags ='-s -L"C:\Perl\perl\lib\CORE" -L"C:\Perl\c\lib"' libpth=C:\Perl\c\lib C:\Perl\c\i686-w64-mingw32\lib C:\Perl\c\lib\gcc\i686-w64-mingw32\7.1.0 libs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs= -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 libc= so=dll useshrplib=true libperl=libperl526.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs dlext=xs.dll d_dlsymun=undef ccdlflags=' ' cccdlflags=' ' lddlflags='-mdll -s -L"C:\Perl\perl\lib\CORE" -L"C:\Perl\c\lib"' --- @INC for perl 5.26.2: C:/Perl/perl/site/lib C:/Perl/perl/vendor/lib C:/Perl/perl/lib --- Environment for perl 5.26.2: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=... PERL_BADLANG (unset) SHELL (unset) ```
p5pRT commented 6 years ago

From @jkeenan

I've been able to reproduce this in perl-5.24.1. Hence\, it's not a regression introduced in the 5.27 development cycle and is not a blocker for 5.28.0.

Thank you very much.

-- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

toddr commented 4 years ago

This output seems to be consistent going back to 5.8 at the least. This sounds more like a feature request or by design.

hvds commented 4 years ago

It smells like a bug to me, it is decidedly odd that the line-7 example warns when none of the other 4 examples do. If I encountered this I'd also be reporting it.

demerphq commented 4 years ago

On Fri, 31 Jan 2020 at 11:49, Hugo van der Sanden notifications@github.com wrote:

It smells like a bug to me, it is decidedly odd that the line-7 example warns when none of the other 4 examples do. If I encountered this I'd also be reporting it.

It is definitely a bug, or more specifically a broken heuristic.

See toke.c line 4955:

        if (*s == '[') {
            PL_tokenbuf[0] = '@';
            if (ckWARN(WARN_SYNTAX)) {
                char *t = s+1;

                while (   isSPACE(*t)
                       || isWORDCHAR_lazy_if_safe(t, PL_bufend, UTF)
                       || *t == '$')
                {
                    t += UTF ? UTF8SKIP(t) : 1;
                }
                if (*t++ == ',') {
                    PL_bufptr = skipspace(PL_bufptr); /* XXX can

realloc / while (t < PL_bufend && t != ']') t++; Perlwarner(aTHX packWARN(WARN_SYNTAX), "Multidimensional syntax %" UTF8f " not supported", UTF8fARG(UTF,(int)((t - PL_bufptr) + 1), PL_bufptr)); } } }

As written it doesnt grok that this is a function call. The parens case works because when it falls out of the loop it is pointing at a '('.

This shouldn't be too hard to fix either. I have to say however, it feels wrong to me that this is done in toke.c. It feels like it really should be resolved somehow in the parser, not the toker. I bet Zefram would know how to fix it properly if he felt like it.

Yves

demerphq commented 4 years ago

On Fri, 31 Jan 2020 at 13:35, demerphq demerphq@gmail.com wrote:

On Fri, 31 Jan 2020 at 11:49, Hugo van der Sanden < notifications@github.com> wrote:

It smells like a bug to me, it is decidedly odd that the line-7 example warns when none of the other 4 examples do. If I encountered this I'd also be reporting it.

It is definitely a bug, or more specifically a broken heuristic.

See toke.c line 4955:

        if (*s == '[') {
            PL_tokenbuf[0] = '@';
            if (ckWARN(WARN_SYNTAX)) {
                char *t = s+1;

                while (   isSPACE(*t)
                       || isWORDCHAR_lazy_if_safe(t, PL_bufend, UTF)
                       || *t == '$')
                {
                    t += UTF ? UTF8SKIP(t) : 1;
                }
                if (*t++ == ',') {
                    PL_bufptr = skipspace(PL_bufptr); /* XXX can

realloc / while (t < PL_bufend && t != ']') t++; Perlwarner(aTHX packWARN(WARN_SYNTAX), "Multidimensional syntax %" UTF8f " not supported", UTF8fARG(UTF,(int)((t - PL_bufptr) + 1), PL_bufptr)); } } }

As written it doesnt grok that this is a function call. The parens case works because when it falls out of the loop it is pointing at a '('.

This shouldn't be too hard to fix either. I have to say however, it feels wrong to me that this is done in toke.c. It feels like it really should be resolved somehow in the parser, not the toker. I bet Zefram would know how to fix it properly if he felt like it.

Anyway, until someone who understands the grammar much better than me steps up to move this to the parser I have pushed smoke-me/fix_issue_16535 to close this ticket.

Cheers, yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

jkeenan commented 4 years ago

On Fri, 31 Jan 2020 at 13:35, demerphq @.***> wrote: [snip] Anyway, until someone who understands the grammar much better than me steps up to move this to the parser I have pushed smoke-me/fix_issue_16535 to close this ticket. Cheers, yves [snip]

+1

The OP's program PASS on smoke-me/fix_issue_16535 branch.

demerphq commented 4 years ago

demerphq wrote:

This shouldn't be too hard to fix either. I have to say however, it feels wrong to me that this is done in toke.c. It feels like it really should be resolved somehow in the parser, not the toker.

If there's a consensus to make this a semantic heuristic rather than a syntactic one, it would be fairly easy to make it so. There are three productions in perly.y that generate an OP_AELEM, and each coerces the index expression to scalar. It is at that point, immediately before applying the context, that the index expression optree could sensibly be examined, and a warning generated if it looked too list-like.

One would have to come up with a specific semantic heuristic. Note that many situations that currently generate the "Multidimensional syntax not supported" warning also generate an entirely semantic "Useless use in void context" warning, so the separate "Multidimensional syntax" warning looks a bit redundant. One might want to trigger it on exactly the same semantic condition that generates the "Useless use" warning. More radically, one might want to entirely merge the warnings, making the "Multidimensional syntax" part just a parenthetical note in "Useless use" warnings that arise in this context.

I'm not going to offer an opinion on what conditions ought to trigger this warning. I swore off that class of debate years ago.

-zefram

demerphq commented 4 years ago

On Sat, 1 Feb 2020, 12:27 Zefram, zefram@fysh.org wrote:

demerphq wrote:

This shouldn't be too hard to fix either. I have to say however, it feels wrong to me that this is done in toke.c. It feels like it really should be resolved somehow in the parser, not the toker.

If there's a consensus to make this a semantic heuristic rather than a syntactic one, it would be fairly easy to make it so. There are three productions in perly.y that generate an OP_AELEM, and each coerces the index expression to scalar. It is at that point, immediately before applying the context, that the index expression optree could sensibly be examined, and a warning generated if it looked too list-like.

One would have to come up with a specific semantic heuristic. Note that many situations that currently generate the "Multidimensional syntax not supported" warning also generate an entirely semantic "Useless use in void context" warning, so the separate "Multidimensional syntax" warning looks a bit redundant. One might want to trigger it on exactly the same semantic condition that generates the "Useless use" warning. More radically, one might want to entirely merge the warnings, making the "Multidimensional syntax" part just a parenthetical note in "Useless use" warnings that arise in this context.

I'm not going to offer an opinion on what conditions ought to trigger this warning. I swore off that class of debate years ago.

Problem is I think you are one of the few who can offer a cogent summary of the options available to us. So while i appreciate your desire to stay out of any debates on the subject we need your insight before we can even start any debate on this.

Personally my thought is that if someone uses a list as an array subscript we should warn. Not sure how to state that more formally. Eg imo if we warn on

$array[$x,$y]

Then we should also warn on

$array[do{ $x, $y}]

But we shouldn't warn at compile time if it's a function call unless the function returns a list.

FWIW when I was writing tests I didnt see any "useless use of..." warnings from things like

$array[0x1,0x2]

Cheers Yves

demerphq commented 4 years ago

demerphq wrote:

Personally my thought is that if someone uses a list as an array subscript we should warn. Not sure how to state that more formally.

It might be as simple as the top-level op being an OP_LIST.

But we shouldn't warn at compile time if it's a function call unless the function returns a list.

One can't generally tell at compile time whether the function would want to return a list (however that's defined). A function call is pretty opaque.

FWIW when I was writing tests I didnt see any "useless use of..." warnings from things like

$array[0x1,0x2]

The constant 1 (however spelled) is exempt from that warning, because of its common use as a filler value. If you were to swap the two list elements in the above expression, you'd get a warning for useless use of a constant 2.

-zefram

demerphq commented 4 years ago

On Sat, 1 Feb 2020, 20:42 Zefram, zefram@fysh.org wrote:

demerphq wrote:

Personally my thought is that if someone uses a list as an array subscript we should warn. Not sure how to state that more formally.

It might be as simple as the top-level op being an OP_LIST.

I'd be fine with that personally...

But we shouldn't warn at compile time if it's a function call unless the function returns a list.

One can't generally tell at compile time whether the function would want to return a list (however that's defined). A function call is pretty opaque.

Sorry, my mistake in how I phrased that. I should have said that detecting a list return should be a runtime check.

FWIW when I was writing tests I didnt see any "useless use of..." warnings from things like

$array[0x1,0x2]

The constant 1 (however spelled) is exempt from that warning, because of its common use as a filler value. If you were to swap the two list elements in the above expression, you'd get a warning for useless use of a constant 2.

Oh. Interesting. I'll write more tests. Thanks!

Cheers Yves

demerphq commented 4 years ago

demerphq wrote:

Sorry, my mistake in how I phrased that. I should have said that detecting a list return should be a runtime check.

That would be difficult, and would give that version of the warning a very different character from the compile-time versions. Since the function call gets put into scalar context, the function can't return a list at runtime. To detect an attempt to return a list, i.e., the return value coming from an OP_LIST (or whatever the criterion is) that inherited scalar context from the function call, you'd need to add a bunch of machinery to the op execution system. That's quite an unattractive approach, from the core maintenance point of view.

-zefram

demerphq commented 4 years ago

On Sat, 1 Feb 2020, 21:06 Zefram, zefram@fysh.org wrote:

demerphq wrote:

Sorry, my mistake in how I phrased that. I should have said that detecting a list return should be a runtime check.

That would be difficult, and would give that version of the warning a very different character from the compile-time versions. Since the function call gets put into scalar context, the function can't return a list at runtime. To detect an attempt to return a list, i.e., the return value coming from an OP_LIST (or whatever the criterion is) that inherited scalar context from the function call, you'd need to add a bunch of machinery to the op execution system. That's quite an unattractive approach, from the core maintenance point of view.

Ok. Fair enough. Scratch that then.

Thanks, Yves

demerphq commented 4 years ago

I wrote:

It might be as simple as the top-level op being an OP_LIST.

Further thought: it would make sense to match the condition for a hash index expression to be treated as multidimensional. This condition is in fact the top-level op being an OP_LIST.

-zefram

demerphq commented 4 years ago

On Fri, 31 Jan 2020 at 23:59, James E Keenan notifications@github.com wrote:

On Fri, 31 Jan 2020 at 13:35, demerphq @.***> wrote: [snip] Anyway, until someone who understands the grammar much better than me steps up to move this to the parser I have pushed smoke-me/fix_issue_16535 to close this ticket. Cheers, yves [snip]

+1

The OP's program PASS on smoke-me/fix_issue_16535 branch.

I have merged this change as 41eecd54c335a0342b04dbea635695db80579946 but I think we should leave the ticket open a while so Zefram can have some time to come up with something better.

cheers, yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

tonycoz commented 11 months ago

This was fixed by 41eecd54c