MusicPlayerDaemon / MPD

Music Player Daemon
https://www.musicpd.org/
GNU General Public License v2.0
2.17k stars 352 forks source link

Filter syntax error on nested AND clauses #2100

Closed m7a closed 1 month ago

m7a commented 2 months ago

Bug report

Describe the bug

When querying with a filter that contains nested “AND” conditions, the behaviour is influenced by the order of clauses with a syntax error being reported when the first clause is another AND clause.

Although it is unlikely that someone would write such a simple nested AND expression I came across this bug during the implementation of my own client as I attempted to automatically assemble a filter expression with the filter parts being taken from multiple inputs of the program. There, generating a nested AND structure was easier than to flatten it explicitly. For now, I work around this issue by changing the order of clauses generated by my client.

Expected Behavior

I expected the processing of AND clauses to be independent of their nesting and ordering. Hence I would have expected to get the same find result twice and no syntax error for the session below:

Actual Behavior

$ nc 127.0.0.1 6600
OK MPD 0.24.0
find "((artist == \"Linkin Park\") AND ((album == \"Meteora\") AND (title == \"Faint\")))"
file: album/linkin_park_meteora/01_07_faint.flac
Last-Modified: 2024-03-03T20:02:23Z
Added: 2024-08-18T18:56:26Z
Format: 44100:16:2
Album: Meteora
Artist: Linkin Park
Date: 2003
Title: Faint
Track: 7
Time: 162
duration: 162.106
OK
find "(((album == \"Meteora\") AND (title == \"Faint\")) AND (artist == \"Linkin Park\"))"
ACK [2@0] {find} Word expected

Version

Built from most recent commit as of finding this issue: 965c466e9bda262790e76edd5272e9e74b407ff3. Originally found on the MPD in Debian stable (“Music Player Daemon 0.23.12 (0.23.12)”)

Music Player Daemon 0.24 (v0.23.15-1511-g965c466e9)
Copyright 2003-2007 Warren Dukes <warren.dukes@gmail.com>
Copyright 2008-2021 Max Kellermann <max.kellermann@gmail.com>
This is free software; see the source for copying conditions.  There is NO
warranty; not even MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Database plugins:
 simple proxy upnp

Storage plugins:
 local udisks nfs curl

Neighbor plugins:
 upnp udisks

Decoder plugins:
 [mpg123] mp3
 [mad] mp3 mp2
 [vorbis] ogg oga
 [oggflac] ogg oga
 [flac] flac
 [opus] opus ogg oga
 [sndfile] wav aiff aif au snd paf iff svx sf voc w64 pvf xi htk caf sd2
 [audiofile] wav au aiff aif
 [dsdiff] dff
 [dsf] dsf
 [faad] aac
 [mpcdec] mpc
 [wavpack] wv
 [modplug] 669 amf ams dbm dfm dsm far it med mdl mod mtm mt2 okt s3m stm ult umx xm
 [mikmod] amf dsm far gdm imf it med mod mtm s3m stm stx ult uni xm
 [wildmidi] mid
 [fluidsynth] mid
 [adplug] amd d00 hsc laa rad raw sa2
 [gme] ay gbs gym hes kss nsf nsfe rsn sap spc vgm vgz
 [ffmpeg] 264 265 302 3g2 3gp 4xm 669 722 aa aa3 aac aax ac3 ace acm act adf adp ads adx aea afc aiff aix al alias_pix alp amf amr amrnb amrwb ams anm ans apc ape apl apm apng aptx aptxhd aqt argo_asf argo_brp argo_cvg art asc asf asf_o ass ast au avc avi avif avr avs avs2 avs3 bcstm bethsoftvid bfi bfstm bin bink binka bit bitpacked bmp_pipe bmv boa brender_pix brstm c2 c93 caf cdata cdg cdxl cgi cif cine codec2raw concat cri_pipe dash dat data daud dav dbm dds_pipe dfa dff dfpwm dif digi dirac diz dmf dnxhd dpx_pipe dsf dsicin dsm dss dst dtk dtm dts dtshd dv dvbsub dvbtxt dxa ea eac3 exr_pipe f32be f32le f4v f64be f64le fap far ffmetadata film_cpk fits flac flic flm flv frm fsb fwse g722 g723_1 g726 g726le g729 gdm gdv gem_pipe genh gif gif_pipe gsm gxf h261 h263 h264 h265 h26l hca hcom hevc hls hnm ice ico idcin idf idx iff ifv ilbc image2 image2pipe imf imx ipmovie ipu ircam ism isma ismv iss it iv8 ivf ivr j2b j2k j2k_pipe jacosub jpeg_pipe jpegls_pipe jpegxl_pipe jv kux kvag libgme lmlm4 loas lrc lvf lxf m15 m2a m4a m4b m4v mac mca mcc mdl med mgsts microdvd mj2 mjpeg mjpg mk3d mka mks mkv mlp mlv mm mmcmp mmf mms mo3 mod mods moflex mov mp2 mp3 mp4 mpa mpc mpc8 mpeg mpegts mpegtsraw mpegvideo mpl2 mpo mptm msbc msf msnwctcp msp mt2 mtaf mtm mtv musx mv mvi mxf mxg nfo nist nsp nst nsv nut nuv obu ogg okt oma omg paf pam_pipe pbm_pipe pcx_pipe pfm_pipe pgm_pipe pgmyuv_pipe pgx_pipe phm_pipe photocd_pipe pictor_pipe pjs plm pmp png_pipe pp_bnk ppm ppm_pipe psd_pipe psm psp psxstr pt36 ptm pva pvf qcif qcp qdraw_pipe qoi_pipe r3d rco rcv rgb rl2 rm roq rpl rsd rso rt rtp rtsp s16be s24be s24le s32be s32le s337m s3m sami sap sb sbc sbg scc scd sdp sdr2 sds sdx ser sf sfx sfx2 sga sgi_pipe shn sln smi smk smush sol son sox spdif sph srt ss2 st26 stk stl stm stp str sub sunrast_pipe sup svag svg_pipe svs sw swf tak tco tedcaptions thd thp tiertexseq tiff_pipe tmv tta txd txt ty ty+ u16be u24be u24le u32be u32le ub ul ult umx uw v v210 vag vb vbn_pipe vc1 vidc viv vividas vmd voc vpk vqe vqf vql vt vtt w64 wav wc3movie webm webm_dash_manifest webp_pipe wow wsaud wsd wsvqa wtv wv wve xa xbin xbm_pipe xl xm xmv xpk xpm_pipe xvag xwd_pipe xwma y4m yop yuv yuv10 rtp:// rtsp:// rtsps://
 [pcm]

Filters:
 libsamplerate soxr

Tag plugins:
 id3tag

Output plugins:
 shout null fifo sndio pipe alsa ao openal pulse jack httpd snapcast recorder

Encoder plugins:
 null vorbis opus lame twolame wave flac shine

Archive plugins:
 [bz2] bz2
 [zzip] zip
 [iso] iso

Input plugins:
 file archive alsa qobuz curl ffmpeg nfs mms cdio_paranoia

Playlist plugins:
 extm3u m3u pls xspf asx rss soundcloud flac cue embcue

Protocols:
 file:// alsa:// cdda:// ftp:// ftps:// gopher:// hls+http:// hls+https:// http:// https:// mms:// mmsh:// mmst:// mmsu:// nfs:// qobuz:// rtmp:// rtmpe:// rtmps:// rtmpt:// rtmpte:// rtmpts:// rtp:// rtsp:// rtsps:// scp:// sftp:// smb:// srtp://

Other features:
 avahi dbus udisks epoll icu inotify ipv6 systemd tcp un

Configuration

Stripped-down config for testing. The same behavior was observed on my actual setup which includes output etc.

music_directory     "/home/linux-fan/wd/music2"
playlist_directory  "/home/linux-fan/wd/music2/supplementary/mpd/playlists"
db_file         "/home/linux-fan/wd/music2/supplementary/mpd/tag_cache"
state_file      "/home/linux-fan/wd/music2/supplementary/mpd/state"
sticker_file        "/home/linux-fan/wd/music2/supplementary/mpd/sticker.sql"
bind_to_address     "127.0.0.1"
port            "6600"
connection_timeout  "120"
log_level       "notice"
restore_paused      "yes"
auto_update     "yes"
input {
    plugin      "file"
    enabled     "yes"
}
decoder {
    plugin      "flac"
    enabled     "yes"
}
filesystem_charset  "UTF-8"

Log

AFAIU no log is generated about this.

jcorporation commented 2 months ago

I think nesting is not supported and has no sense with "AND" only.

m7a commented 2 months ago

I think nesting is not supported and has no sense with "AND" only.

Do you mean not supported at all or just not in case of AND?

I am using nested filters quite successfully and so far the AND behavior has been the only odd one during my tests.

One wouldn't probably write nested ANDs by hand, but for automatic composition of filters this seemed to be the best way to express it in my program.

If its only about directly nested AND not being supported, I am perfectly fine with that and would then go on to change my client to avoid generating such filters. The odd thing is that right now it seems to work in some cases but not all and I couldn't get a definite answer from the documentation https://mpd.readthedocs.io/en/latest/protocol.html#filters which to me reads as if arbitrary nesting should be possible. If this turns out to be wrong, it is of course also OK to add this as a hint to the protocol documentation :smile:

jcorporation commented 2 months ago

I quickly looked at the code and I couldn’t find any handling for nesting.

m7a commented 2 months ago

I quickly looked at the code and I couldn’t find any handling for nesting.

As far as I understand, the filter nesting (at least on the parsing) is enabled by the recursive structure of the ParseExpression function in SongFilter (file src/song/Filter.cxx).

Per my preliminary understanding, AND expressions are parsed in a peculiar manner that instead of s = StripLeft(s + 1) they use ++s to strip off a closing parentheses marking the end of the AND expression. I don't exactly understand why it would make a difference, but the following change seems to get both of my inputs accepted:

diff --git a/src/song/Filter.cxx b/src/song/Filter.cxx
index 6cef1f8b0..566a50109 100644
--- a/src/song/Filter.cxx
+++ b/src/song/Filter.cxx
@@ -326,7 +326,7 @@ SongFilter::ParseExpression(const char *&s, bool fold_case)
    if (*s == '(') {
        auto first = ParseExpression(s, fold_case);
        if (*s == ')') {
-           ++s;
+           s = StripLeft(s + 1);
            return first;
        }

@@ -340,7 +340,7 @@ SongFilter::ParseExpression(const char *&s, bool fold_case)
            and_filter->AddItem(ParseExpression(s, fold_case));

            if (*s == ')') {
-               ++s;
+               s = StripLeft(s + 1);
                return and_filter;
            }

Now I need to find out why this could work (I don't believe it myself yet...) and also do further tests to see that it doesn't break anything...

I also discovered that there is a test program ParseSongFilter (test/ParseSongFilter.cxx) that calls an optimization step which resolves the superflous nesting of AND statements (among other optimizations).

jcorporation commented 2 months ago

I think StripLeft removes both ) while ++s only skips the next char.

m7a commented 2 months ago

I think StripLeft removes both ) while ++s only skips the next char.

I checked with the code and the behaviour differs as follows:

I tried to draw some sort of trace to show what happens in the two cases (more a note to myself as to why this change works). Note that according to this only the second ++s needs to be replaced.

Say we have a nested expression (((A) AND (B)) AND (C)) then the parsing goes
as follows:

ParseExpression("(((A) AND (B)) AND (C))", false)
                 ^
    assert(*s == '(')); // OK
    s = StripLeft(s+1) // now have ((A) AND (B)) AND (C))

    if (*s == '(') { // yes
        auto first = ParseExpression(s, fold_case)
        // -- recurse --
        assert(*s == '('));  // OK
        s = StripLeft(s+1) // now have (A) AND (B)) AND (C))
        if (*s == '(')) { // yes
            auto first = ParseExpression(s, fold_casee);
            // -- recurse --
            assert(*s == '(')); // OK
            s = StripLeft(s+1) // now have A) AND (B)) AND C))
            if (*s == '(') ; // no!
            // parse 'A' and cut it off s.t. we now have `AND (B)) AND C))
            // -- parse expression returns --
            if (*s == ')') ;// no
            if (expectWord(s) != "AND") ; // no - now have `(B)) AND C))
            auto and_fitler = make_unique<AndSongFilter>();
            and_filter->AddItem(std::move(first));
            while (true) {
                and_filter->AddItem(ParseExpression(s, fold_case));
                // -- recurse --
                assert(*s == '('); // OK
                s = StripLeft(s+1); // now ave B)) AND C))
                if (*s == '(')) ; // no
                // parse B and cut it off s.t. we now have ) AND C))
                // -- parse expression returns --
                if (*s == ')') { // yes
                    ++s; // !!!!
                    return and_filter; // we now have ` AND C))`
                }
            }
        }
        // -- parse expression returns --
        if (*s == ')'); // no
        if (ExpectWord(s) != "AND") // ! throws {find} Word expected!
                        // because we have ` AND C))` i.e. leading space not proper "AND" word!

Now same with s = StripLeft(s + 1) in place of `!!!!` continuse the trail
differently:

                    s = StripLeft(s + 1);
                    return and_filter; // we now have `AND C))`
                }
            }
        }
        // -- parse expression returns --
        if (*s == ')'); // no
        if (ExpectWord(s) != "AND"); // no
        // code continues successfully parsing the remainder...

The first ++s is curious anyways, because in my opinion it is only used to parse some sort of “trivial” AND expressions where no AND is present at all i.e. it is currently valid to issue the following query:

find "(((album == \"Meteora\")))"

As far as I understand this triggers the first ++s twice (recursively) and then goes to process the (album == \"Meteora\") query.

For consistency I would suggest to replace both ++s with s = StripLeft(s + 1) although the bug is already fixed by replacing just the second instance.

I could also confirm that it is related to the spacing by replacing my previously erroneous query as follows causing it to work even without my patch:

find "(((album == \"Meteora\")AND(title == \"Faint\"))AND(artist == \"Linkin Park\"))"
file: album/linkin_park_meteora/01_07_faint.flac
Last-Modified: 2024-03-03T20:02:23Z
Format: 44100:16:2
Album: Meteora
Artist: Linkin Park
Date: 2003
Title: Faint
Track: 7
Time: 162
duration: 162.106

Although from reading the docs I would have expected that to be invalid syntax because the documentation clearly shows spacing around the ANDs...

MaxKellermann commented 1 month ago

6db4b818e679d24f46c1222ddf732ae60117cb18