Firesphere / silverstripe-solr-search

Advanced searching with SilverStripe, Solarium and Solr
https://firesphere.github.io/solr-docs
GNU General Public License v3.0
11 stars 13 forks source link

(Re)-inroduce userland control of includeSubclasses #162

Open phptek opened 4 years ago

phptek commented 4 years ago

Is your feature request related to a problem? Please describe.

As the title says, I would like to have control over whether or not the module uses, or not, all subclasses of a base class.

The old FTS module allowed classes to be passed directly into a SearchQuery instance as an array, with a child value of 'includeSubclasses' => true|false,.

Describe the solution you'd like

Something like this (using valid YML!)

...
Catalyst\MyProject\Search\MyIndex:
  MyIndex:
    Classes:
      - SilverStripe\CMS\Model\SiteTree
      - SilverStripe\Assets\File:
          # Index `File` classes, but not `Image` classes
          include_subclasses: false
      - DNADesign\Elemental\Models\BaseElement
    FulltextFields:
      - Title
...

The underlying issues (AFAICT) is that CoreServiceTrait::getClassesInHierarchy() bakes-in the 2nd param to FieldResolver::getHierarchy($class, true) making it always true.

[UPDATE] I set this to false and images were still indexed.

Describe alternatives you've considered

In PageController.php::results():

...
$index = Injector::inst()->create(MyIndex::class);
$query = SearchQuery::create()
    ->setClasses([
        [
            'class' => File::class,
            'config' => ['include_subclasses' => false]
        ]
    ])
    ->setStart($start)
    ->setFields($params[0])
    ->setTerms($params)
    ->setRows(self::$results_per_page)
    ->setExclude($exclusions);
...

...but with the above option, I never liked it TBH. Could never tell whether it was overriding what you had built in your SolrIndex subclass defs...or not

phptek commented 4 years ago

In the meantime I have simply made it an exclusion which seems to work OK, but feels a little bit like a workaround:

...
Catalyst\MyProject\Search\MyIndex:
  MyIndex:
    Classes:
      - SilverStripe\CMS\Model\SiteTree
      - SilverStripe\Assets\File:
      - DNADesign\Elemental\Models\BaseElement
    FulltextFields:
      - Title
    FilterFields:
      - ClassName
...

Then in PageController.php

$exclusions = [
    'SiteTree_ShowInSearch' => 0,
    'File_ClassName' => Image::class,
];

$query = SearchQuery::create()
...
        ->setExclude($exclusions);
...
Firesphere commented 4 years ago

Do I understand the feature request correctly if I say:

Or

This could be more easily solved at indexing time than at query time, e.g. the latter. Thoughts @marczhermo ?

phptek commented 4 years ago

As you say @Firesphere - there's no point indexing and storing data that I know my project won't use. Sure, I can exclude subclasses at runtime, but then why is my indedx full up with stuff I'm never going to use.

So sure: Some way of declaring in YML config that I want to use baseClass X, but not its subclasses.

Firesphere commented 4 years ago

So... Option 2?

Firesphere commented 4 years ago

@marczhermo I'm thinking to index everything, including subclasses, but at query time, apply the filter at query time.

But, explicitly ignored classes, in this case, subclasses that are excluded from the index, if possible, don't index them, but index them if it's an easier approach.

It would make the implementation a lot easier and flexible. Thoughts?

TL;DR: Index if it is easier, exclude at query time where possible.

Firesphere commented 4 years ago

Instead of using the classes, I feel it would be useful to have a method and variable, that would exclude the extended classes.

Although I understand @phptek his argument, I feel it's better to require explicit exclusion. A global application might have unwanted side effects. My suggestion is a YML option saying ExcludeSubclassesFor together with a method that goes setExcludeSubclasses($classThatShouldNotIncludeSubclasses).

That gives a finer level of control, I think.

This is a next-release issue though, to me. The "workaround" posted above is an easier implementation right now. And we can document that.

marczhermo commented 4 years ago

Hi @Firesphere, If I understand it correctly, @phptek is asking to re-use existing method, FieldResolver::getHierarchy($class, true) to work with yml config

- SilverStripe\Assets\File:
          # Index `File` classes, but not `Image` classes
          include_subclasses: false

If that is the case I have no strong opinion against it because it might be a little bit of rewiring in order to connect the configuration to existing method. A PR is welcome here :)

Also agree about a somewhat a global variable might have unwanted side effects. However, if this is properly documented and with an example edge case below in excluding child classes as a whole: For example SiteTree has the column, ShowInSearch in which end-users might tick this option knowing that this particular child (eg. ContactPage extends Page) class will be searchable but will not work because the page class is excluded.

We also have ShowInSearch column both on SiteTree and File which should help in checking whether it is going to the index or not and is most granular approach in my opinion both indexing and searching.

@Firesphere Does having a data extension for Image with ShowInSearch or getShowInSearch will not achieve the same thing?

Firesphere commented 4 years ago

@marczhermo No, because the check for ShowInSearch is on the database field, not a method...

I just looked at the old module, and it has a todo (probably from about 20 years ago), to remove the feature

Firesphere commented 4 years ago

I gave this a thought and I honestly feel, it should be dealt with within the search query excludes and not as part of the indexing. For multiple reasons:

  1. An exclude based on a class is more verbose
  2. The original module maintainers already don't like it
  3. It adds unnecessary overhead, for something that is easily avoided at query time
  4. There are multiple cases, where an index should have subclasses for one query, but not for the other. Enforcing to create two Indexes, just for that, makes things worse.
Firesphere commented 4 years ago

I'm re-opening this issue, because I do feel it has added value as a submodule.

My thoughts for the submodule:

This would mean an extension to both BaseQuery and BaseIndex to actively exclude things, but it would make things easier for the developers.

Firesphere commented 4 years ago

@marczhermo Thoughs? ^^

Pushed to the NtH pipeline, low priority, as the workaround does it's job for now