cwrc / DEPRECATED--CWRC-Dialogs

0 stars 0 forks source link

Entity search functionality: partial matching #80

Open jefferya opened 9 years ago

jefferya commented 9 years ago

Not sure how to invoke partial matching (the new API seems to require the full term be typed before matches are listed in the results pane while the stand-alone will match partial names). Not sure which is best.

SusanBrown commented 9 years ago

I think partial matches are preferable on the whole but not essential if what is meant is on-the-fly partial matches.

How is the full name indicated—with a return or click? If you put in “liz" and click/return will it still return Elizabeth as well as Liz? If so, I think that’s good enough for now.

On May 3, 2015, at 9:22 PM, Jeffery Antoniuk notifications@github.com wrote:

Not sure how to invoke partial matching (the new API seems to require the full term be typed before matches are listed in the results pane while the stand-alone will match partial names). Not sure which is best.

— Reply to this email directly or view it on GitHub https://github.com/cwrc/CWRC-Dialogs/issues/80.

jefferya commented 9 years ago

Within the integrated CWRC-Writer, if I enter "bun" then get zero results. Within the standalone CWRC-Writer, if I enter "bun" then I get "Bunny, Crazy" and "Bunny, Bugs NEMO"

Interpolating as we don't yet have ingested entities for the integrated CWRC-Writer on beta, a search for "liz" would not include "Lizars", "Elizabeth", "liz". The standalone version would include the first and third but not the second as the partial match assumes the start of the term is correct. This is based on the assumption there is not a variant name for "elizabeth" that is "liz".

ghost commented 9 years ago

Right, so standalone behaviour is preferable, but we can live with integrated behaviour for now.

On May 3, 2015, at 9:47 PM, Jeffery Antoniuk notifications@github.com wrote:

Within the integrated CWRC-Writer, if I enter "bun" then get zero results. Within the standalone CWRC-Writer, if I enter "bun" then I get "Bunny, Crazy" and "Bunny, Bugs NEMO"

Interpolating as we don't yet have ingested entities for the integrated CWRC-Writer on beta, a search for "liz" would not include "Lizars", "Elizabeth", "liz". The standalone version would include the first and third but not the second as the partial match assumes the start of the term is correct. This is based on the assumption there is not a variant name for "elizabeth" that is "liz".

— Reply to this email directly or view it on GitHub https://github.com/cwrc/CWRC-Dialogs/issues/80#issuecomment-98565632.

jefferya commented 9 years ago

See note 2015-05-04 CWRC-Writer module eval for discussion DisMax and Edge text. Possible compromise: http://stackoverflow.com/questions/4824954/solr-partial-and-full-string-match

ilovan commented 9 years ago

Not sure if this is the right place to post this, but sorting in ViAF searches seems to be rather wonky: for example, the first hit for "Bertrand" is "Aeschilus" and the only mention of a Berrand in the VIAF record for Aeschilus is in a 500--1 field

ghost commented 9 years ago

Yes the Viaf results seem extremely off. I think Jeff posted something a while back about this having to do with the names being broken into parts.

On May 25, 2015, at 7:51 PM, Mihaela Ilovan notifications@github.com wrote:

Not sure if this is the right place to post this, but sorting in ViAF searches seems to be rather wonky: for example, the first hit for "Bertrand" is "Aeschilus" and the only mention of a Berrand in the VIAF record for Aeschilus is in a 500--1 field

— Reply to this email directly or view it on GitHub.

jefferya commented 9 years ago

The VIAF backend is very limited in terms of functionality (at least the endpoint currently utilised), perhaps too limiting.

Here are the examples:

(1) limit of 10 search results max per page (whereas 100 is the default for CWRC entities) - #82

(2) sort order is one of two choices, both have limited utility in my opinion: a) by WorldCat holdings count or b) reverse order of loading (whatever this means). there is no alphabetical sort or order by weighting with respect to the query terms

https://www.oclc.org/developer/develop/web-services/viaf/authority-cluster.en.html

(3) query matching (e.g. #70) option allow restricting the returned results but the options are very granular. The "exact" match seems to search for the exact string including punctuation which might lead to zero results if the user types in an inverted ordered name instead of a direct ordered name or visa-versa. Note this is supposition with limited exploration.

(4) unclear to me as to whether or not it is possible to limit the fields queried while restricting to "person" and allowing pseudonyms and other alternative names to be also queried.

Michael, am I misrepresenting our past discussion in any way? #3 might be the only one that can be tweaked and I'm unsure as to whether or not the exact match will be worse than the current.

Any thoughts on how to proceed?

What we currently have might be the best we can do without building and hosting our own VIAF endpoint.

Cheers, ~Jeff

On Mon, May 25, 2015 at 6:07 PM, Susan Brown notifications@github.com wrote:

Yes the Viaf results seem extremely off. I think Jeff posted something a while back about this having to do with the names being broken into parts.

On May 25, 2015, at 7:51 PM, Mihaela Ilovan notifications@github.com wrote:

Not sure if this is the right place to post this, but sorting in ViAF searches seems to be rather wonky: for example, the first hit for "Bertrand" is "Aeschilus" and the only mention of a Berrand in the VIAF record for Aeschilus is in a 500--1 field

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/cwrc/CWRC-Dialogs/issues/80#issuecomment-105339115.