getkirby / kirby

Kirby's core application folder
https://getkirby.com
Other
1.32k stars 168 forks source link

Respect structured field types (YAML/JSON) in search #5490

Open GiantCrocodile opened 1 year ago

GiantCrocodile commented 1 year ago

Description

I have a search function to search articles in my blog. This works in general. I just noticed that a specific 2 word search term results in 2 results where I would expect just 1 result.

Search function:

return function ($site) {

  $query   = get('q');
  $results = page('blog')->index()->listed()->search($query, 'title|text|tags');
  $results = $results->paginate(20);

  return [
    'query'      => $query,
    'results'    => $results,
    'pagination' => $results->pagination()
  ];

};

I checked the article.de.txt file that gets found and I see that the metadata field id of the blocks field text contains the search string by chance. From a technical perspective it makes sense that the search function finds this. From an user perspective it doesn’t make sense because the article doesn’t contain the 2 chars long search phrase so this can’t be what the user actually searches for.

Expected behavior
The metadata is being ignored and thus the search results are what the user will expect.

Your setup

v3.9 and v4.0 alpha

Additional context
see https://forum.getkirby.com/t/page-search-returns-unexpected-result/29317/2?u=warg

distantnative commented 1 year ago

Summary for the team

I don't think there is a good way currently to resolve layout and blocks field content first before search, so that their json markup doesn't accidentally is matched. For that, we would need to load the blueprints and check which field is what type. Currently, this would be quite the performance drag.

We could revisit this once our blueprints are refactored and thus not as performance heavy.

Of if anyone has another idea, please add it.

afbora commented 1 year ago

I've an idea 💡

Actually, the right thing is to read the blueprint as @distantnative said, but this will cause serious performance problems. Instead, I think we can easily solve this issue by specifying the data type in the search as we use ->toStructure(), ->toBlocks() for field output. Here rough example:

 $results = $site->search($query, 'title|adresses:yaml|text:json|tags');

What do you think? We can decide the syntax and data type names later.

lukasbestle commented 1 year ago

I like the idea. It could even be useful in the long run (parsing the blueprints will still be slower than explicit configuration, even after the refactoring). The syntax is great IMO.

GiantCrocodile commented 1 year ago

Sounds good and sounds more stable than auto-detecting something. So I think the "needs: delay" tag can be removed.

distantnative commented 1 year ago
$results = $site->search($query, 'title|adresses:yaml|text:json|tags');

Following the example, I got the $field for text and now my search components sees the type specified as json, what does it do now to retrieve a string it can use for searching?

afbora commented 1 year ago

I see what you mean. Yes, it can be difficult to extract the relevant field data from JSON. For that may need ->toString() method for each data handler. When the search is called, I think the following code would work: Data::decode($field->value())->toString()

Or it may be better to specify the field type directly, not sure: 'title|adresses:structure|text:blocks|tags'

lukasbestle commented 1 year ago

We could convert the values into a cleaned string, maybe like so:

  1. First parse the field depending on the data type (Data::decode($value, $type)).
  2. Walk over the array structure, remove typical metadata fields and concatenate all remaining values recursively, ignoring the keys.
  3. Search in that cleaned string

Rough code for 2:

function convert(array $array): string {
    unset($array['id']);
    unset($array['type']);
    // ...

    foreach ($array as $key => $value) {
        if (is_array($value) === true) {
            // recurse
            $array[$key] = convert($value);
        }
    }

    return implode(' ', array_values($array));
}

Or it may be better to specify the field type directly, not sure: 'title|adresses:structure|text:blocks|tags'

I like that. It would allow us to fine-tuned the ignored keys per field type.

distantnative commented 1 year ago

@lukasbestle I think this is the right direction. However, I would suggest to rather have dedicated search methods in the respective classes (Block, StructureObject) which are called from the component when specified. That way those methods can much better decide what fields can be thrown out for their own datatype.

lukasbestle commented 1 year ago

Makes sense 👍

GiantCrocodile commented 9 months ago

Is this something that could be part of v4.2 or v4.3?