[Docu] Semantics of contextKeywords->text not clear

pstoehr commented 9 years ago

A recommendation with the contextKeywords-text entry "assdwefwnff" return an empty result list which is absolutely reasonable.

Changing the contextKeywords-text entry to "assdwefwnff halloween" give the following response: { "provider": "federated", "totalResults": 10, "partnerResponseState": [ { "systemID": "RijksMuseum", "success": true }, { "systemID": "Deutsche Digitale Bibliothek", "success": true }, { "systemID": "Europeana", "success": false, "errorMessage": "Waited too long for partner system 'Europeana' to respond 5023 ms " }, { "systemID": "Kierling", "success": true }, { "systemID": "KIMPortal", "success": true }, { "systemID": "Mendeley", "success": true }, { "systemID": "ZBW", "success": true } ], "queryID": "1468329700", "result": [ { "resultGroup": [

  ],
  "documentBadge": {
    "id": "1be48aaa-7f1a-30a4-9161-3c86ccb57073",
    "uri": "http:\/\/www.mendeley.com\/research\/stock-markets-really-so-inefficient-case-halloween-indicator",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "Are stock markets really so inefficient? The case of the \u201cHalloween Indicator\u201d",
  "date": "2014-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "777a392e-a381-328d-833b-5d3f1ef2abae",
    "uri": "http:\/\/www.mendeley.com\/research\/routine-screening-halloween-candy-helpful-hazardous",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "Routine screening of Halloween candy: helpful or hazardous?",
  "date": "1993-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "80ca5529-8ce6-3cf7-825b-ee8ea3bdf204",
    "uri": "http:\/\/www.mendeley.com\/research\/halloween-effect-japanese-equity-prices-myth-exploitable-anomaly",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Halloween Effect and Japanese Equity Prices: Myth or Exploitable Anomaly",
  "date": "2005-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "c38226e4-fd82-365a-9578-9211762c2266",
    "uri": "http:\/\/www.mendeley.com\/research\/celtic-origins-halloween-transcend-fear",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Celtic Origins of Halloween Transcend Fear.",
  "date": "2010-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "65f5c3b7-72a9-34e9-8fa5-410dbc72621b",
    "uri": "http:\/\/www.mendeley.com\/research\/halloween-effect-trick-treat",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Halloween effect: Trick or treat?",
  "date": "2010-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "16d29f6e-3009-358b-87ec-4e99f57af53d",
    "uri": "http:\/\/www.mendeley.com\/research\/halloween-puzzle-selected-asian-stock-markets",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Halloween puzzle in selected Asian stock markets",
  "date": "2011-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "8461ef24-5d64-37fc-950d-dc0e38b6abd9",
    "uri": "http:\/\/www.mendeley.com\/research\/influence-valentines-day-halloween-birth-timing",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "Influence of Valentine's Day and Halloween on birth timing.",
  "date": "2011-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "b307de8d-181d-353f-b867-b9326deb2f5b",
    "uri": "http:\/\/www.mendeley.com\/research\/wave-transformation-near-virginia-coast-halloween-northeaster",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "WAVE TRANSFORMATION NEAR VIRGINIA COAST - THE HALLOWEEN NORTHEASTER",
  "date": "1995-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "de365aa6-ba09-367d-a573-efd14ca0a6d7",
    "uri": "http:\/\/www.mendeley.com\/research\/halloween-effect-sectors",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Halloween Effect in U.S. Sectors",
  "date": "2009-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
},
{
  "resultGroup": [

  ],
  "documentBadge": {
    "id": "d1204aa0-b057-3378-af23-6d9941880346",
    "uri": "http:\/\/www.mendeley.com\/research\/halloween-effect-everywhere-time",
    "provider": "Mendeley"
  },
  "mediaType": "text",
  "title": "The Halloween Effect: Everywhere and All the Time",
  "date": "2012-01-01",
  "language": "unknown",
  "licence": "https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode",
  "generatingQuery": "(assdwefwnff AND halloween)"
}

] }

It is not totally clear why the second query returns more results. Looking at the last result, the "generatingQuery" was "(assdwefwnff AND halloween)". As the recommendation for "assdwefwnff" returns an empty result list, it is unclear why the more specific recommendation for "(assdwefwnff AND halloween)" now returns some elements.

hziak commented 9 years ago

The system relies on how the partners interpret the queries. So even if we formulate the query explicit, like the example above, the partner might behave unexpected. In this case Mendeley seems to interpret the query like a logical disjunction.

pstoehr commented 9 years ago

Thanks for this clarification!

Is this the reason why neither this query

{ "partnerList": [ { "systemId": "KIMPortal" } ], "numResults": 20, "contextKeywords": [ { "text": "Europa K\u00f6nig" } ], "origin": { "clientType": "Swift-Test-Client", "clientVersion": "0.21", "module": "OS X Prototype", "userID": "PDPS-WS2015" } } nor that one { "partnerList": [ { "systemId": "KIMPortal" } ], "numResults": 20, "contextKeywords": [ { "text": "K\u00f6nig", "isMainTopic": true }, { "text": "Europa", "isMainTopic": false } ], "origin": { "clientType": "Swift-Test-Client", "clientVersion": "0.21", "module": "OS X Prototype", "userID": "PDPS-WS2015" } }

doesn't return any result. By the way, we are not able to find any combination for 2 search phrases that returns a result.

Nevertheless, the web-front end (https://www.kgportal.bl.ch/sammlungen) returns results for the keywords "König Europa".

hziak commented 9 years ago

for this query you could retrieve results

{ "partnerList": [ { "systemId": "KIMPortal" } ], "numResults": 20, "contextKeywords": [ { "text": "König", "isMainTopic": false }, { "text": "Europa", "isMainTopic": false } ], "origin": { "clientType": "Swift-Test-Client", "clientVersion": "0.21", "module": "OS X Prototype", "userID": "PDPS-WS2015" } }

(the query in that case would be (König OR Europa))

if you put two keywords into the query like in example 1 the query terms are conjunct in the case of KimPortal since it is considered as phrase. So the final query will be (König AND Europa) In the second case you consider König as main topic and Europe as subtopic this leads to the query (König) AND (Europa). Kim is more restrictive with the AND here in comparison to others. e.g. Mendeley

In general one of the main ideas of the maintopic field is, since we can normally easily file a result list with 10 or 20 results if all partners are used, gaining a higher precision. Like the example above, or the web frontend of Kim, the system is returning results but they are quite unspecific and often off topic. (Like https://www.kgportal.bl.ch/sammlungen#c45b060e-4761-aafb-5c88-88951b5d89ff or https://www.kgportal.bl.ch/sammlungen#b82e6ed7-4ef9-4a1a-36a0-96ab77d631a8)

hziak commented 9 years ago

just as example where the mainTopic makes sense:

"contextKeywords": [ { "text": "Napoleon", "isMainTopic": true }, { "text": "Frankreich", "isMainTopic": false }, { "text": "Schweiz", "isMainTopic": false }, { "text": "Helena", "isMainTopic": false } ] query: (Napoleon) AND (Frankreich OR Schweiz OR Helena) if you turn off the main topic for the keyword napoleon you will get a lot of results that are not related to the rest of the keywords

pstoehr commented 9 years ago

Thanks for this information.

As far as I can see the programmer of a client has to take into accout what functionalities are provided by which Data Provider.

chseifert commented 9 years ago

Peter, definitely not. Because: 1) Data Provider might join or leave the federated recommender dynamically. 2) The data provider's search backend and thus, their query capabilities might change (e.g. elastic search instead of apache solr)

I know that the current solution is not optimal, but working around and optimizing for single data providers is of no use - one then might as well connect to the data provider directly.

hziak commented 9 years ago

thank you @chseifert, although you had more or less the same answer i was already in the process of answering. Therefore i post it anyway.

Actually the intension of the system is to provide most helpfull items to users. In a usual use case the client programmer should not care from which partner results a returned. Even further when the final source selection approaches are deployed, the system might remove providers from it's pool for certain queries. Selection of certain partners is only an addition feature that can be used, if needed in special cases, but it is not the main intension of the framework that partners are queried on it's own.

The main topic it self is an possibility for frontends to increase precision.

pstoehr commented 9 years ago

In the best of all possible world both of you are right, but ...

If you generate a query with the "Napoleon-data" Mendeley would return a result-set that seems to be something like "Napoleon OR Frankreich OR Schweiz OR Helena" and KimPortal one for "(Napoleon) AND (Frankreich OR Schweiz OR Helena)" (I hope I got it right)

Thus, when a user provides more that one key word, the data from the KimPortal result-set seems to be "more common" (or "less surprising") for an "average user" than the data from the Mendeley result-set. Based on that, our client might rank result-sets of "multi-keyword-recommendation" generated by KimPortal higher that the results of the same "mult-keyword-recommendation" from Mendeley (as the results from KimPortal are likely to be more specific).

And again: Yes, both of you are right when you mention that this ranking is based on a momentary survey, uses internal knowledge and might not be valid in "42 days". But currently such a heuristic helps us to "improve" the overall result-set.

And yes for the third time, both of you are right again! It would be brain-damaging-stupid not to use the EEXCESS-PP for fetching those data!

chseifert commented 9 years ago

at Mendeley query: right. KimPortal query: right
just the former is not capable to have something like a main topic (but does a very good job with keywords already, for instance, taking nearness of keywords in the text into account)

What you basically suggest for your client is a client-based result-set aggregation. Would be interesting to see how this turns out (especially in terms of user satisfaction). Conceptionally, this would be the job of the federated recommender - with the setting - that nothing is known about specific partners - and all analysis that you have been doing manually to find things out , would have to be done automatically without human intervention.

hziak commented 9 years ago

just to correct something here, i meant that in that specific case above Mendeley is interpreting the query like an OR query, but thats just in that case above where one search term is missing. In general Mendeley is highly sophisticated regarding their interpretation of the query and the ranking. If you take for example the query below and set the main topic field on the single contextKeywords randomly you will always get different result list returned.

{ "partnerList": [ { "systemId": "Mendeley" } ], "numResults": 20, "contextKeywords": [ { "text": "Napoleon", "isMainTopic": false }, { "text": "Frankreich", "isMainTopic": false }, { "text": "Schweiz", "isMainTopic": false }, { "text": "Helena", "isMainTopic": false }

] }

EEXCESS / eexcess

[Docu] Semantics of contextKeywords->text not clear #20