elastic / elasticsearch-php

Official PHP client for Elasticsearch.
https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/index.html
MIT License
5.26k stars 966 forks source link

Get/update fails for item IDs that contain spaces #1278

Closed cdlate closed 1 year ago

cdlate commented 1 year ago

Summary

I want to update or get a record having id containing a space using the PHP client library. Expected: The record having the correct id is returned/updated Actual: The record is not found

Code snippet of problem

Test dataset:

Kibana dev tools for inserts:

POST test_index/doc/_bulk {"index": {"_index": "test_index", "_id": "test test"}} {"field1": "value1"}

This will return data properly: GET /test_index/_doc/test%20test

$params = [
  'index' => 'test_index',
  //'id'    => 'lp-3MIYB7NAUVXHg5pvt', THIS is working perfectly(no spaces)
  'id'    => 'test test', // this is not working because urlencode does not respect the url encode standard.
  'type' => '_doc'
];

$response = $elasticClient->get($params);

I think the problem is caused by the usage of urlencode instead of rawurlencode in the file AbstractEndPoint, not sure about how many client's methods are affected, not sure if the latest versions are affected, but I highly suspect they are.

See: https://www.php.net/manual/en/function.urldecode.php See: https://www.php.net/manual/en/function.rawurlencode.php

System details

ezimuel commented 1 year ago

@cdlate you right, the + character is not encoded anymore in a space since ES 7.4. You can force to use + as space in ES 7.x with the settings es.rest.url_plus_as_space to true. We need to fix this as you proposed using rawurlencode() instead of urlencode(). At the same time, we need to avoid BC break since we cannot change the URL encoding for existing document (containing space in ID) indexed with elasticsearch-php. I'm thinking to use an env variable like ES_URL_PLUS_AS_SPACE that will be false as default. We will use rawurlencode() if the previous env will be false, otherwise we will continue to use urlencode(). Any feedback? Thanks!

cdlate commented 1 year ago

Hey @ezimuel It is a good plan! I knew I was missing something about the evolution of the library, thanks for giving me context about it.

Can I contribute?

ezimuel commented 1 year ago

I'll provide a PR for fixing it in elasticsearch-php 7.17.2. You can help me with a review, thanks!

ezimuel commented 1 year ago

I sent #1303 for fixing the issue. @cdlate can you check if this work for you? Thanks.

ezimuel commented 1 year ago

@cdlate I just merged #1303 in 7.17 branch. I'll release 7.17.2 soon.

ezimuel commented 1 year ago

Released 7.17.2