WordPress / pattern-directory

The WordPress Block Pattern Directory
https://wordpress.org/patterns/
GNU General Public License v2.0
120 stars 34 forks source link

Search: Invalid ElasticSearch request on pentester submissions #688

Open dd32 opened 4 months ago

dd32 commented 4 months ago

Describe the bug A pentester has hit the wporg-patterns api in such a way that the pattern directory generates an invalid ElasticSearch request. Causing a 400 Bad Request warning.

I can't tell if this is supposed to work, as the code-branch doesn't currently work. I suspect this is just unexpected input to the API endpoint.

For example:

GET patterns/wp-json/wp/v2/wporg-pattern?per_page=6&curation=core&search=block

This code: https://github.com/WordPress/pattern-directory/blob/f7ffec071aeddc51bb92a22b8c6a721b9ae34e72/public_html/wp-content/plugins/pattern-directory/includes/search.php#L152-L154

resulted in this ES query, and response:

Query piece: {"terms":{"taxonomy.wporg-pattern-keyword.term_id":"core"}}

Error: [terms] query does not support [taxonomy.wporg-pattern-keyword.term_id]

terms valid input would be an array, and well core is never going to match a term_id.. which is what leads me to think that the endpoint is not expecting a query-by-slugs.

To Reproduce Steps to reproduce the behavior:

  1. Query via patterns/wp-json/wp/v2/wporg-pattern?per_page=6&curation=core&search=block
  2. Get a 400 error.
  3. To get the underlying ES error, you need to be an automattician with a WordPress.com sandbox so you can get the underlying queries.

Expected behavior Either the API should throw a error immediately if it gets invalid input OR the fields should be validated prior to querying ES.

E_USER_WARNING: jetpack_search_abort - no_search_results_array - {"errors":{"invalid_search_api_response":["Invalid response from API - 400"]},"error_data":[]} in wp-content/plugins/pattern-directory/includes/search.php:186
pkevan commented 4 months ago

terms valid input would be an array, and well core is never going to match a term_id.. which is what leads me to think that the endpoint is not expecting a query-by-slugs.

I wonder if the switch to the new theme is the cause, since we are using the slug here: https://github.com/WordPress/pattern-directory/blob/trunk/public_html/wp-content/themes/wporg-pattern-directory-2024/functions.php#L167-L188 - and a quick look in the previous theme doesn't appear to have the same functionality.

pkevan commented 4 months ago

Actually, I wonder if the taxonomy doesn't exist on the ES side, if it was added? I'll investigate that too.

pkevan commented 4 months ago

Actually, I wonder if the taxonomy doesn't exist on the ES side, if it was added? I'll investigate that too.

It does exist, so it's not this πŸ˜„

ryelle commented 4 months ago

The curation param should be a string, that is expected β€” it was added almost a year ago https://github.com/WordPress/pattern-directory/pull/583

But the core request could also create a query with a term_id (using pattern-keywords) when fetching these remote patterns. For example: /patterns/wp-json/wp/v2/wporg-pattern?page=1&per_page=100&order=desc&orderby=date&locale=en_US&wp-version=6.6-alpha-58124&pattern-keywords=11

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

dd32 commented 4 months ago

Ah! it's pattern-keywords that I was thinking of having term_id's. That threw me for a spin when I initially looked at this, and I was second guessing if it was expected or junk.

Changing to term as per #689 seems straight forward, but I would probably suggest it should be terms and just casted to an array, to account for multiple-term-queries, in conjunction with checking the field for term_id or slug.. or just assuming that all numeric-inputs would be a term_id, and otherwise by slug.

pkevan commented 4 months ago

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

Thanks for the extra details @ryelle - i'll work up the PR to account for that, as well as the potential @dd32 mentioned with multiple terms.

pkevan commented 4 months ago

Changing to term as per #689 seems straight forward, but I would probably suggest it should be terms and just casted to an array, to account for multiple-term-queries, in conjunction with checking the field for term_id or slug.. or just assuming that all numeric-inputs would be a term_id, and otherwise by slug.

For curation, it seems like we only allow one variant:

curation is not one of all, core, and community. is the message when you add multiple.

But the core request could also create a query with a term_id (using pattern-keywords) when fetching these remote patterns. For example: /patterns/wp-json/wp/v2/wporg-pattern?page=1&per_page=100&order=desc&orderby=date&locale=en_US&wp-version=6.6-alpha-58124&pattern-keywords=11

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

The change in #689 doesn't appear to change the output when querying pattern-keywords from what I can see - can you try testing it to double check i'm not testing it incorrectly @ryelle πŸ˜„

ryelle commented 4 months ago

With #689, the curation=core requests no longer error, but I don't think it's returning correctly. These two queries should return the same set of results, since curation=core is the same as pattern-keywords=11, but using curation returns an empty set. pattern-keywords=11 returns 3 results (on prod).

Also, on the PR, I get the jetpack_search_abort - no_search_results_array error when running a search with the pattern-keywords value (pattern-keywords is just the query name for the wporg-pattern-keyword tax). Try it with the second URL above.

One thing that was throwing me off is that https://wordpress.org/patterns/?s=dark currently works, but it sounds like it shouldn't, since that uses curation=core. Turns out, the frontend of the site isn't using ES anymore, since it only applies to REST requests. Sure enough, if I disable the filter, it breaks (even with #689, but I don't know exactly how to debug the ES query to continue investigating).

pkevan commented 4 months ago

https://wordpress.org/patterns/wp-json/wp/v2/wporg-pattern?curation=core&search=dark https://wordpress.org/patterns/wp-json/wp/v2/wporg-pattern?pattern-keywords=11&search=dark

These return different results on prod, so suspect there is something else afoot - it looks like the search keyword isn't being in the same way - in the curation one it's just via pattern_content whereas pattern-keyword is using several fields to query on.

Turns out, the frontend of the site isn't using ES anymore, since it only applies to REST requests. Sure enough, if I disable the filter, it breaks (even with https://github.com/WordPress/pattern-directory/pull/689, but I don't know exactly how to debug the ES query to continue investigating).

Hmmm, I guess it used to work?

ryelle commented 4 months ago

Hmmm, I guess it used to work?

Yeah, the frontend used to be JS-powered, so it used the API too. Now it's not, so it doesn't used ES at all.

pkevan commented 4 months ago

What's the preference here? Use ES or not, i'm not sure I understand how the site needs to function enough to know.

ryelle commented 4 months ago

Yes, it should still use ES on the frontend too (not just the API).