EHRI / ehri-frontend

The EHRI project's portal interface.
https://portal.ehri-project.eu
European Union Public License 1.2
14 stars 9 forks source link

Guides: fulltext search #524

Closed michalfrankl closed 9 years ago

michalfrankl commented 9 years ago

Just a reminder that fulltext search should become available for the guides.

mikesname commented 9 years ago

Just to clarify: do you mean:

  1. search just units that are contained within the virtual collection associated with a particular guide
  2. searching the content of guide pages, e.g. the descriptions, free text, HTML

To give you an idea why 1) is not just a trivial thing, it's because the search engine doesn't (and can't) keep track of what item is in which virtual collection, because it's a many-to-many relationship. If you made one trivial change by adding/removing top-level item to/from a VC you would have to re-index potentially thousands of items (and this is not a theoretical concern.)

The only way we could really do it is a brute-force approach which first found every item in or referenced by a VC, and all their children, and then fed those (potentially thousands of) IDs as a filter to the search engine.

I know you probably don't care about these details, but just so you know, if this was trivial it would have already been done. And I am strongly against hacks that will make this stuff even less maintainable.

On the other hand, 2) I can probably manage quite easily.

michalfrankl commented 9 years ago

Dear Mike,

thanks - I did mean 1 which Thibault promised as a feature coming later. I understand the problem, but I still think it is a rather expected feature. I'm not familiar enough with the way the data is indexed, so that I can't react to details. It just seems to me that if in the future, we will work more with virtual collections, and users might be creating their own, than such and indexing would come in handy.

Having 2 as well would be nice, especially since we have decided to create some pages by hand which I have anticipated to be generated automatically. But 1 has priority for me.

Michal

On 18 February 2015 at 18:00, Mike Bryant notifications@github.com wrote:

Just to clarify: do you mean:

  1. search just units that are contained within the virtual collection associated with a particular guide
  2. searching the content of guide pages, e.g. the descriptions, free text, HTML

To give you an idea why 1) is not just a trivial thing, it's because the search engine doesn't (and can't) keep track of what item is in which virtual collection, because it's a many-to-many relationship. If you made one trivial change by adding/removing top-level item to/from a VC you would have to re-index potentially thousands of items (and this is not a theoretical concern.)

The only way we could really do it is a brute-force approach which first found every item in or referenced by a VC, and all their children, and then fed those (potentially thousands of) IDs as a filter to the search engine.

I know you probably don't care about these details, but just so you know, if this was trivial it would have already been done. And I am strongly against hacks that will make this stuff even less maintainable.

On the other hand, 2) I can probably manage quite easily.

— Reply to this email directly or view it on GitHub https://github.com/mikesname/docview/issues/524#issuecomment-74901775.

mikesname commented 9 years ago

If you have a moment could you test the search on these VCs and see if it works how you expect:

http://portal.aehri.dans.knaw.nl/virtual/michal-frankl-ehri-terezin-research-guide http://portal.aehri.dans.knaw.nl/virtual/michal-frankl-ehri-terezin-research-guide-vc-tm

There are many problems still, some of which I don't think can realistically be solved in EHRI 1.

In theory I should be able to re-use this for the guide pages, though it may be difficult to integrate with the existing relational browsing, and I've already spent way more time on this than we can really afford.

mikesname commented 9 years ago

I've added a search box to the browse page. It doesn't work very well because it's not really connected to the pseudo facets, since they're relational and the search engine is not. However, that's a fundamental problem with the conception and there's not much I can do about it.

michalfrankl commented 9 years ago

Thanks. I understand the core problem and I think we have to live with this solution for the time being. I'm not sure if there's any easy way to explain this to users.

Is there a way to play with sorting the results as well as the default listing (here I'd like to see the data more mixed, from different institutions, to highlight the purpose of the guide and of EHRI more generally).

I still can see the identifiers for the JMP Terezín/Theresienstadt collection on the result (cz-002279-collection-jmp-shoah-t), instead of collection name.

The guides interface doesn't show child items like the portal interface does. It should be enabled here too. This also means that the links on the results set to the child items don't work right now.

mikesname commented 9 years ago

Closing since we won't be doing more development on this.