OpenTransitTools-BonePile / trimet-mod-pelias

TriMet MOD Pelias ... related to https://github.com/conveyal/trimet-mod-otp
1 stars 0 forks source link

MLK alias can't find intersections and stop names (has intersection in there) with "ML King" in their name #33

Closed fpurcell closed 3 years ago

fpurcell commented 4 years ago

Basically, MLK synonyms and variations are spotty. Call center folks' see this as their biggest complaint with Pelias.

This is from a Feb 2020 Slack discussion:

fpurcell Feb 27th at 12:13 AM a) killingsworth & mlk: https://ws-st.trimet.org/pelias/v1/autocomplete?text=mlk%20%26%20killingsworth

b) mlk & ainsworth: https://ws-st.trimet.org/pelias/v1/autocomplete?text=ainsworth%20%26%20mlk Even limiting the search to just stops doesn't seem to work well: https://ws-st.trimet.org/pelias/v1/autocomplete?text=ainsworth%20%26%20mlk&layers=stops And search, although there are results, isn't much better than autocomplete: https://ws-st.trimet.org/pelias/v1/search?text=ainsworth%20%26%20mlk&layers=stops

c) spell out ml king works better: https://ws-st.trimet.org/pelias/v1/autocomplete?text=ne%20ml%20king%20%26%20killingsworth https://ws-st.trimet.org/pelias/v1/autocomplete?text=ne%20ml%20king%20%26%20ainsworth

and spelling out 'martin luther king' also better: https://ws-st.trimet.org/pelias/v1/autocomplete?text=ne%20martin%20luther%20king%20%26%20killingsworth

Julian Simioni 6 months ago hmm. this worked previously, right? i recall there were changes in recent versions of elasticsearch that might have impacted this

fpurcell 6 months ago I'm not sure it ever worked for those stops, Julian. MLK alias' seemingly work well for addresses and intersections. Maybe @myleen remembers whether this worked at one time for those stops.

myleen 6 months ago I don't recall, sorry.

Julian Simioni 6 months ago no problem. we'll take a look and see what the current situation is and what we can do. maybe there are some synonym/alias tweaks or maybe something else

myleen 6 months ago And I would say MLK aliases sometimes work for intersections. See the example immediately above - why is mlk & killingsworth 5th in the list? (edited)

Julian Simioni 6 months ago good question. frank which one of the pelias queries corresponds to that last screenshot?

fpurcell 6 months ago https://ws-st.trimet.org/pelias/v1/search?text=MLK%20and%20Killingsworth

missinglink commented 4 years ago

Heya,

This PR should resolve your issue https://github.com/pelias/schema/pull/449

Without that change you'll find that synonyms from your custom_street file are not working as expected (in some search contexts)

A temporary workaround is to copy the MLK synonyms line from your custom_street file and paste it in the custom_name file (essentially duplicating it).

If that solves the issue that's great feedback that the PR is good to merge.

We can discuss merging that code in our next meeting, or if it's urgent please test and comment on the PR so we can expedite its merging.

missinglink commented 4 years ago

For testing you can use the docker tag corresponding to that branch pelias/schema:custom_street_synonyms

It will require you to recreate the schema (and therefore reindex all the docs) but shouldn't require any other configuration changes, please ensure that this tag is correctly set in docker-compose.yml and you start with a clean build for testing.

note: I mentioned above copy/pasting between files, this is not required when using this PR. the copy/paste thing is just a quick hacky workaround and I think this PR method is superior because copy/paste everywhere is just terrible πŸ‘Ή

missinglink commented 4 years ago

I'm happy with that change so I've merged it to master.

We're still interested in hearing your feedback πŸ™

fpurcell commented 3 years ago

Building new index now, and will be testing this week. Will edit this comment with feedback, Peter. Thanks for the help...

fpurcell commented 3 years ago

@missinglink your synonym fix is working really well...I'm seeing much better results with various combinations of "mlk & " and "ml king & ", etc..

All these queries now work, and find both intersections and stops (which wasn't happening previously): βœ” [1] "/v1/autocomplete?text=killingsworth & mlk" βœ” [2] "/v1/autocomplete?text=mlk & killingsworth" βœ” [3] "/v1/autocomplete?text=ml king & killingsworth" βœ” [4] "/v1/autocomplete?text=mlk & hawthorn" βœ” [5] "/v1/autocomplete?text=mlk & alberta" βœ” [6] "/v1/autocomplete?text=mlk & oak" βœ” [7] "/v1/autocomplete?text=ml king & oak" βœ” [8] "/v1/autocomplete?text=church & mlk" βœ” [9] "/v1/autocomplete?text=tv hwy & 170" βœ” [10] "/v1/autocomplete?text=bh hwy & western" βœ” [11] "/v1/autocomplete?text=bhh & western" βœ” [12] "/v1/autocomplete?text=western & bhh" βœ” [13] "/v1/autocomplete?text=western & bh hwy" βœ” [14] "/v1/autocomplete?text=bh hwy & 96"

Two issues (i.e., not anything we're worried about) that I did see are:

One (very minor) issue I ran into is the address parser treats the 'king' portion of 'ml king' (when on the right hand side of the search element) as an admin zone: https://ws.trimet.org/pelias/v1/search?text=hawthorn%20%26%20ml%20king "parsed_text": { "subject": "hawthorn & ml", "street": "hawthorn", "cross_street": "ml", "admin": "king" }

One other (extremely minor) issue was reversing the street names in the search with synonyms and spaces: whereas the searches "bh hwy & western", "bhh & western" and "western & bhh" all correctly found stops named "SW Beaverton-Hillsdale & Western", the search "western & bh hwy" only found the intersection, and not the stops. Address parsing looked fine too: "query": { "text": "western & bh hwy", "parser": "pelias", "parsed_text": { "subject": "western & bh hwy", "street": "western", "cross_street": "bh hwy" },

fpurcell commented 3 years ago

BTW, the search endpoint is not working as well as autocomplete. The craziest response is from this query: http://rj-dv-mapgeo01:4000/v1/search?text=tv%20hwy%20%26%20170 "tv hwy & 170"

I get a 500 error:

   "errors": [
      "[query_shard_exception] failed to create query: {\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"match\" : {\n          \"name.default\" : {\n            \"query\" : ... <bunch of other unformatted stuff...>
fpurcell commented 3 years ago

The search service also seems to find a lot of intersections first, rather than stops. (Not sure the weights we have on stops have ever had the same effect in the search service that they do in autocomplete). And it also has a bit of trouble finding good matches compared to autocomplete.

missinglink commented 3 years ago

Ok good to hear it's working, I know it was an issue for your customer service team, hopefully they're happier with this version πŸ˜„

That 500 error is alarming, we shouldn't be seeing those sort of errors! I'd like to open a new issue for that to investigate.

If it's simple enough could you please copy paste the entire query_shard_exception response?

It would save me a bunch of time trying to reproduce your setup πŸ™

We can also open an issue for the search vs. autocomplete issues you're finding, maybe it's time we revisited some of your config values.

There's some other work in the pipeline at the moment for improving autocomplete queries containing an ampersand so I'll let you know when we're getting closer to merging that as it'll possibly impact these queries.

fpurcell commented 3 years ago

Hey Peter,

Here's the output from the tv%20hwy%20%26%20170 query. (BTW, I just noticed that v%20hwy%20%26%20170 works fine ... adding the 't' is what throws the error):

// 20200914182924
// http://rj-dv-mapgeo01:4000/v1/search?text=tv%20hwy%20%26%20170

{
  "geocoding": {
    "version": "0.2",
    "attribution": "http://rj-dv-mapgeo01:4000/attribution",
    "query": {
      "text": "tv hwy & 170",
      "size": 10,
      "private": false,
      "focus.point.lat": 45.52,
      "focus.point.lon": -122.67,
      "lang": {
        "name": "English",
        "iso6391": "en",
        "iso6393": "eng",
        "via": "header",
        "defaulted": false
      },
      "querySize": 20,
      "parser": "pelias",
      "parsed_text": {
        "subject": "tv hwy & 170",
        "street": "tv hwy",
        "cross_street": "170"
      }
    },
    "errors": [
      "[query_shard_exception] failed to create query: {\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"match\" : {\n          \"name.default\" : {\n            \"query\" : \"tv hwy & 170\",\n            \"operator\" : \"OR\",\n            \"analyzer\" : \"peliasQuery\",\n            \"prefix_length\" : 0,\n            \"max_expansions\" : 50,\n            \"minimum_should_match\" : \"1<-1 3<-25%\",\n            \"fuzzy_transpositions\" : true,\n            \"lenient\" : false,\n            \"zero_terms_query\" : \"NONE\",\n            \"cutoff_frequency\" : 0.01,\n            \"auto_generate_synonyms_phrase_query\" : true,\n            \"boost\" : 1.0\n          }\n        }\n      }\n    ],\n    \"should\" : [\n      {\n        \"match_phrase\" : {\n          \"phrase.default\" : {\n            \"query\" : \"tv hwy & 170\",\n            \"analyzer\" : \"peliasPhrase\",\n            \"slop\" : 2,\n            \"zero_terms_query\" : \"NONE\",\n            \"boost\" : 1.0\n          }\n        }\n      },\n      {\n        \"function_score\" : {\n          \"query\" : {\n            \"match_all\" : {\n              \"boost\" : 1.0\n            }\n          },\n          \"functions\" : [\n            {\n              \"filter\" : {\n                \"match_all\" : {\n                  \"boost\" : 1.0\n                }\n              },\n              \"weight\" : 3.0,\n              \"exp\" : {\n                \"center_point\" : {\n                  \"origin\" : {\n                    \"lat\" : 45.52,\n                    \"lon\" : -122.67\n                  },\n                  \"offset\" : \"0km\",\n                  \"scale\" : \"50km\",\n                  \"decay\" : 0.5\n                },\n                \"multi_value_mode\" : \"MIN\"\n              }\n            }\n          ],\n          \"score_mode\" : \"avg\",\n          \"boost_mode\" : \"replace\",\n          \"max_boost\" : 3.4028235E38,\n          \"boost\" : 1.0\n        }\n      },\n      {\n        \"function_score\" : {\n          \"query\" : {\n            \"match_all\" : {\n              \"boost\" : 1.0\n            }\n          },\n          \"functions\" : [\n            {\n              \"filter\" : {\n                \"match_all\" : {\n                  \"boost\" : 1.0\n                }\n              },\n              \"weight\" : 1.0,\n              \"field_value_factor\" : {\n                \"field\" : \"popularity\",\n                \"factor\" : 1.0,\n                \"missing\" : 1.0,\n                \"modifier\" : \"log1p\"\n              }\n            }\n          ],\n          \"score_mode\" : \"first\",\n          \"boost_mode\" : \"replace\",\n          \"max_boost\" : 20.0,\n          \"boost\" : 1.0\n        }\n      },\n      {\n        \"function_score\" : {\n          \"query\" : {\n            \"match_all\" : {\n              \"boost\" : 1.0\n            }\n          },\n          \"functions\" : [\n            {\n              \"filter\" : {\n                \"match_all\" : {\n                  \"boost\" : 1.0\n                }\n              },\n              \"weight\" : 2.0,\n              \"field_value_factor\" : {\n                \"field\" : \"population\",\n                \"factor\" : 1.0,\n                \"missing\" : 1.0,\n                \"modifier\" : \"log1p\"\n              }\n            }\n          ],\n          \"score_mode\" : \"first\",\n          \"boost_mode\" : \"replace\",\n          \"max_boost\" : 20.0,\n          \"boost\" : 1.0\n        }\n      },\n      {\n        \"function_score\" : {\n          \"query\" : {\n            \"match_all\" : {\n              \"boost\" : 1.0\n            }\n          },\n          \"functions\" : [\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"activity_center\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 3.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"airport\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 2.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"facility\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 2.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"hospital\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 3.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"major_employer\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 2.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"pr\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 2.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"station\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 3.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"stops\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 3.0\n            },\n            {\n              \"filter\" : {\n                \"match\" : {\n                  \"layer\" : {\n                    \"query\" : \"tc\",\n                    \"operator\" : \"OR\",\n                    \"prefix_length\" : 0,\n                    \"max_expansions\" : 50,\n                    \"fuzzy_transpositions\" : true,\n                    \"lenient\" : false,\n                    \"zero_terms_query\" : \"NONE\",\n                    \"auto_generate_synonyms_phrase_query\" : true,\n                    \"boost\" : 1.0\n                  }\n                }\n              },\n              \"weight\" : 2.0\n            }\n          ],\n          \"score_mode\" : \"sum\",\n          \"boost_mode\" : \"multiply\",\n          \"max_boost\" : 50.0,\n          \"min_score\" : 1.0,\n          \"boost\" : 5.0\n        }\n      }\n    ],\n    \"adjust_pure_negative\" : true,\n    \"boost\" : 1.0\n  }\n}, with { index_uuid=\"zt0Eue3mSfmfcaNzHHCUnw\" & index=\"pelias\" }"
    ],
    "engine": {
      "name": "Pelias",
      "author": "Mapzen",
      "version": "1.0"
    },
    "timestamp": 1600132701297
  },
  "type": "FeatureCollection",
  "features": [

  ]
}
fpurcell commented 3 years ago

And with new lines added for readability:

    "errors": [
      "[query_shard_exception] failed to create query: {\n
  \"bool\" : {\n
    \"must\" : [\n
      {\n
        \"match\" : {\n
          \"name.default\" : {\n
            \"query\" : \"tv hwy & 170\",\n
            \"operator\" : \"OR\",\n
            \"analyzer\" : \"peliasQuery\",\n
            \"prefix_length\" : 0,\n
            \"max_expansions\" : 50,\n
            \"minimum_should_match\" : \"1<-1 3<-25%\",\n
            \"fuzzy_transpositions\" : true,\n
            \"lenient\" : false,\n
            \"zero_terms_query\" : \"NONE\",\n
            \"cutoff_frequency\" : 0.01,\n
            \"auto_generate_synonyms_phrase_query\" : true,\n
            \"boost\" : 1.0\n
          }\n
        }\n
      }\n
    ],\n
    \"should\" : [\n
      {\n
        \"match_phrase\" : {\n
          \"phrase.default\" : {\n
            \"query\" : \"tv hwy & 170\",\n
            \"analyzer\" : \"peliasPhrase\",\n
            \"slop\" : 2,\n
            \"zero_terms_query\" : \"NONE\",\n
            \"boost\" : 1.0\n
          }\n
        }\n
      },\n
      {\n
        \"function_score\" : {\n
          \"query\" : {\n
            \"match_all\" : {\n
              \"boost\" : 1.0\n
            }\n
          },\n
          \"functions\" : [\n
            {\n
              \"filter\" : {\n
                \"match_all\" : {\n
                  \"boost\" : 1.0\n
                }\n
              },\n
              \"weight\" : 3.0,\n
              \"exp\" : {\n
                \"center_point\" : {\n
                  \"origin\" : {\n
                    \"lat\" : 45.52,\n
                    \"lon\" : -122.67\n
                  },\n
                  \"offset\" : \"0km\",\n
                  \"scale\" : \"50km\",\n
                  \"decay\" : 0.5\n
                },\n
                \"multi_value_mode\" : \"MIN\"\n
              }\n
            }\n
          ],\n
          \"score_mode\" : \"avg\",\n
          \"boost_mode\" : \"replace\",\n
          \"max_boost\" : 3.4028235E38,\n
          \"boost\" : 1.0\n
        }\n
      },\n
      {\n
        \"function_score\" : {\n
          \"query\" : {\n
            \"match_all\" : {\n
              \"boost\" : 1.0\n
            }\n
          },\n
          \"functions\" : [\n
            {\n
              \"filter\" : {\n
                \"match_all\" : {\n
                  \"boost\" : 1.0\n
                }\n
              },\n
              \"weight\" : 1.0,\n
              \"field_value_factor\" : {\n
                \"field\" : \"popularity\",\n
                \"factor\" : 1.0,\n
                \"missing\" : 1.0,\n
                \"modifier\" : \"log1p\"\n
              }\n
            }\n
          ],\n
          \"score_mode\" : \"first\",\n
          \"boost_mode\" : \"replace\",\n
          \"max_boost\" : 20.0,\n
          \"boost\" : 1.0\n
        }\n
      },\n
      {\n
        \"function_score\" : {\n
          \"query\" : {\n
            \"match_all\" : {\n
              \"boost\" : 1.0\n
            }\n
          },\n
          \"functions\" : [\n
            {\n
              \"filter\" : {\n
                \"match_all\" : {\n
                  \"boost\" : 1.0\n
                }\n
              },\n
              \"weight\" : 2.0,\n
              \"field_value_factor\" : {\n
                \"field\" : \"population\",\n
                \"factor\" : 1.0,\n
                \"missing\" : 1.0,\n
                \"modifier\" : \"log1p\"\n
              }\n
            }\n
          ],\n
          \"score_mode\" : \"first\",\n
          \"boost_mode\" : \"replace\",\n
          \"max_boost\" : 20.0,\n
          \"boost\" : 1.0\n
        }\n
      },\n
      {\n
        \"function_score\" : {\n
          \"query\" : {\n
            \"match_all\" : {\n
              \"boost\" : 1.0\n
            }\n
          },\n
          \"functions\" : [\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"activity_center\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 3.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"airport\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 2.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"facility\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 2.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"hospital\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 3.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"major_employer\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 2.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"pr\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 2.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"station\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 3.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"stops\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 3.0\n
            },\n
            {\n
              \"filter\" : {\n
                \"match\" : {\n
                  \"layer\" : {\n
                    \"query\" : \"tc\",\n
                    \"operator\" : \"OR\",\n
                    \"prefix_length\" : 0,\n
                    \"max_expansions\" : 50,\n
                    \"fuzzy_transpositions\" : true,\n
                    \"lenient\" : false,\n
                    \"zero_terms_query\" : \"NONE\",\n
                    \"auto_generate_synonyms_phrase_query\" : true,\n
                    \"boost\" : 1.0\n
                  }\n
                }\n
              },\n
              \"weight\" : 2.0\n
            }\n
          ],\n
          \"score_mode\" : \"sum\",\n
          \"boost_mode\" : \"multiply\",\n
          \"max_boost\" : 50.0,\n
          \"min_score\" : 1.0,\n
          \"boost\" : 5.0\n
        }\n
      }\n
    ],\n
    \"adjust_pure_negative\" : true,\n
    \"boost\" : 1.0\n
  }\n
}, with { index_uuid=\"zt0Eue3mSfmfcaNzHHCUnw\" & index=\"pelias\" }"
    ],
    "engine": {
      "name": "Pelias",
      "author": "Mapzen",
      "version": "1.0"
    },
    "timestamp": 1600132701297
  },
orangejulius commented 3 years ago

Hey @fpurcell, I'm seeing some Elasticsearch options in the query above that we don't set in Pelias. They all relate to fuzzy matching/typo correction, are you on an experimental branch or fooling around with something? :grin:

            \"analyzer\" : \"peliasQuery\",\n
            \"prefix_length\" : 0,\n
            \"max_expansions\" : 50,\n
            \"minimum_should_match\" : \"1<-1 3<-25%\",\n
            \"fuzzy_transpositions\" : true,\n
            \"lenient\" : false,\n
            \"zero_terms_query\" : \"NONE\",\n
fpurcell commented 3 years ago

Sorry for the response delay, @orangejulius. The email alert for your message got filtered...

Anyway, not to my knowledge. In fact, I believe I'm on the latest master branches for all this stuff. I'm just nuke'd everything in Docker and kicked off a rebuild. Will let you know here what happens.

FYI, the configs I use are located here: http://maps7.trimet.org/pelias/

fpurcell commented 3 years ago

@orangejulius / @missinglink ... Pelias just reloaded after nuke of all docker containers. The query http://rj-dv-mapgeo01:4000/v1/search?text=tv%20hwy%20%26%20170 still generates the exceptions show above. Config for my Pelias instance is again here: http://maps7.trimet.org/pelias/ (also use the up-to-date 'master' branch of the Docker project for things like the 'pelias' scripts / executable).

orangejulius commented 3 years ago

Okay, as discussed with @fpurcell in our checkin meeting today, I was finally able to reproduce the failed to create query issue above.

First of all, it's not at all related to any fuzzy matching prototype work (as @missinglink informed me Elasticsearch often prints many more query options with its error messages, even if those options were unset in the query sent to Elasticsearch).

When reproducing the error locally I was able to see the following log output from the Pelias API logs:

{
  "caused_by": {
    "type": "too_many_clauses",
    "reason": "maxClauseCount is set to 1024"
  }
}

It looks like something has changed with the way Elasticsearch now internally constructs the query, and Elasticsearch is bumping up against a limit to the number of Lucene clauses it is allowed to generate in the default configuration.

I found two ways around this issue with a little investigation:

The Elasticsearch docs warn strongly against increasing the max_clause_count, so perhaps there are ways we can simplify the synonyms files in use here so that they no longer generate as many clauses, but still cover the majority of desired cases.

If not, I've opened https://github.com/pelias/docker/pull/225 to explore increasing the max_clause_count.

It can be tested out with the following changes to docker-compose.yml, and then running pelias elastic start to re-create the Elasticsearch container:

diff --git a/docker-compose.yml b/docker-compose.yml
index d84865a..11888c6 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -104,7 +104,7 @@ services:
       - "./pelias.json:/code/pelias.json"
       - "${DATA_DIR}:/data"
   elasticsearch:
-    image: pelias/elasticsearch:7.5.1
+    image: pelias/elasticsearch:7.5.1-query-clauses
     container_name: pelias_elasticsearch
     user: "${DOCKER_USER}"
     restart: always

@fpurcell let me know if this change works for you and we can explore it further, but at least we have a workaround for now.

fpurcell commented 3 years ago

This works for me. I've moved to using image: pelias/elasticsearch:7.5.1-query-clauses. With the 'failed to create query' problem now understood, I deployed the synonym + es-query-clauses work to production. Tests are passing, and all looks good right now. Will let you know if anything else pops up. But happy to have this now deployed. Also looking to hear from our call center folks, who should be happy to now find abbreviations like MLK working a lot better now (will let you know next month).

fpurcell commented 3 years ago

p.s., will leave this issue open for discussion, and close in a couple of days if no further details emerge.

fpurcell commented 3 years ago

Closing ... intersections working well and query-clauses version of ES no longer throwing errors