ErickTamayo / laravel-scout-elastic

Elastic Driver for Laravel Scout
MIT License
915 stars 242 forks source link

result from using search method on model is different from that when i GET url exposed by elastic search #92

Closed chbro closed 6 years ago

chbro commented 6 years ago

Hi, when i use dd(App\Posts::search('场景1')->get()), the result is

Collection {#265 ▼
  #items: []
}

while what i get from http://localhost:9200/my_index/posts/_search?q=content:场景1 is

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 2.9235544,
        "hits": [

wonder why ik doesn't work. myconfig/scout.php is the same as that on readme.

kronthto commented 6 years ago

It builds a different query, using a Request Body instead of URL-params.

I think what's fired to Elastic is:

GET http://localhost:9200/my_index/posts/_search
{"query":{"bool":{"must":[{"query_string":{"query":"*\u573a\u666f1*"}}]}}}
chbro commented 6 years ago

@kronthto i agree. anyway, search method cannot make out here in my code.

now i use

shell_exec('curl http://localhost:9200/my_index/posts/_search?q=content:'.request('q'))

as a replacement.

kronthto commented 6 years ago

If you need to do it that way you could at least use:

file_get_contents('http://localhost:9200/my_index/posts/_search?q=content:'.request('q'))

which is probably faster and more secure than shell_exec, also you don't rely on curl being available on the CLI.

Using Scout would directly map the results to Model-entities, so it would be nice to solve your initial problem. You could try modifying what is sent to ES using the callback-function parameter of the Builder (see https://github.com/ErickTamayo/laravel-scout-elastic/pull/56 / https://github.com/laravel/scout/pull/111).

chbro commented 6 years ago

thx a lot. i rewrote with Builder, it works perfectly now.

chbro commented 6 years ago

so far i have made it to split chinese characters. but a new problem arises:

i have no idea how to strip html in my content,

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html this document just gives a example to demonstrate es can strip html, then how can i use it in my search?

copy from stackoverflow:

I have a document with property that contains html tags. I want to remove html before indexing.  
I found this htmlstrip-charfilter but I can't find example in using this. 
I'm new to elastic search and analyzer concept.  Thanks
kronthto commented 6 years ago

I've never actually done that, but I think you need to define the HTMLStrip-filter as a normalizer type to your index and then add this normalizer to the field using the PUT-mapping API.

It could be something like (not tested):

PUT my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_html_char_filter": {
          "type": "html_strip"
        }
      },
      "normalizer": {
        "my_html_normalizer": {
          "type": "custom",
          "char_filter": ["my_html_char_filter"]
        }
      }
    }
  },
  "mappings": {
    "posts": {
      "properties": {
        "content": {
          "type": "text",
          "normalizer": "my_html_normalizer"
        }
      }
    }
  }
}

(inspired by https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalizers.html)

chbro commented 6 years ago

PUT config to my_index and POST url like my_index/_analyze with param { analyzer: 'my_analyzer', text: '<p>hello</p>' }, then i can get the stripped content.

btw,

it seems using Builder cannot highlight. my code is:

        return App\Docs::search($req->search, function($engine, $query) {

                $query['body']  = [
                    'query' => [
                        'multi_match' => [
                            'query' => request('q'),
                            'fields' => ['name', 'rich_text'] 
                        ]
                    ],
                    'highlight' => [
                        'fields' => [
                            'name' => [
                                'force_source' => true
                            ],
                            'rich_text' => [
                                'force_source' => true
                            ]
                        ]
                    ]
                ];

                return $engine->search($query);

            })->paginate();

but value returned doesn't contain highlight field. strange when i dd($engine->search($query)) , highlight is there in hits.hits :

  "took" => 48
  "timed_out" => false
  "_shards" => array:3 [▼
    "total" => 5
    "successful" => 5
    "failed" => 0
  ]
  "hits" => array:3 [▼
    "total" => 25
    "max_score" => 2.3903573
    "hits" => array:10 [▼
      0 => array:6 [▼
        "_index" => "laravel54"
        "_type" => "docs"
        "_id" => "88"
        "_score" => 2.3903573
        "_source" => array:2 [▶]
        "highlight" => array:1 [▼
          "name" => array:1 [▶]
        ]
      ]
      1 => array:6 [▶]
      2 => array:6 [▶]
      3 => array:6 [▶]
      4 => array:6 [▶]
      5 => array:6 [▶]
      6 => array:6 [▶]
      7 => array:6 [▶]
      8 => array:6 [▶]
      9 => array:6 [▶]
    ]
  ]
]

it will be very kind of u to explain it ?

kronthto commented 6 years ago

When using get/paginate it ignores any field but _id and uses this to query the results from the database: https://github.com/ErickTamayo/laravel-scout-elastic/blob/97deb01452f947b3c515545c2ee65804f1a59853/src/ElasticsearchEngine.php#L217-L222

This behaviour is intended for Scout-drivers. So, yes, highlight is ignored. The only thing you can do is use raw/paginateRaw, but then you lose the mapping to Eloquent.

chbro commented 6 years ago

3q very much,

i've changed to elasticsearch-php, by which i can get raw data returned from elasticsearch and orginize them on my own.

well, last question i want to post data to my_index/_analyze to get stripped content as follow :

$params = array(
    'http' => array(
        'method' => 'POST',
        'header' => 'Content-Type: application/json',
        'content' => http_build_query([
            'analyzer' => 'my_analyzer',
            'text' => $value['_source']['rich_text']
        ])
    )
);
$url = config('scout.elasticsearch.hosts')[0] . '/' . config('scout.elasticsearch.index') . '/_analyze';
$context = stream_context_create($params);
dd($result = file_get_contents($url, false, $context));

but the result is different from what i get on Postman, should i use http_build_query here? i'm new to php.

kronthto commented 6 years ago

elasticsearch-php is what this library here uses under the hood anyways: https://github.com/ErickTamayo/laravel-scout-elastic/blob/97deb01452f947b3c515545c2ee65804f1a59853/composer.json#L8

I think http_build_query is wrong here, because it builds a querystring with ? and &s, which you don't want in the Request-body (only in the URL). If anything, you might have wanted to use json_encode? I never really do requests that way.

In general, if you want to do HTTP Requests in PHP I can only recommend using Guzzle, it makes the code so much cleaner / easier to read because you don't have to deal with stream_context_create and stuff. Then it could look like:

$response = $client->request('POST', config('scout.elasticsearch.hosts')[0] . '/' . config('scout.elasticsearch.index') . '/_analyze', ['json' => 
  [
    'analyzer' => 'my_analyzer',
    'text' => $value['_source']['rich_text']
  ]
]);
chbro commented 6 years ago

excellent !

time to close this issue.

feel very grateful for your help.