Open jure opened 10 years ago
Some progress has been made on this, with the addition of ignoring documents, if they are not found: https://github.com/ScholarNinja/extension/commit/ae883ae5598afac14562091ae9178bbf1a43fadf#diff-b880c77d0f382525de5100984f260cebR336
This results in a query sometimes saying: '16 results found' and only displaying 4, because the other 12 could not be found on the network.
It's a start.
Related to this issue: https://github.com/tsujio/webrtc-chord/issues/5
I'm storing keywords and documents in the DHT, so if you search for "cancer software", it will first retrieve the key "cancer", then the key "software" from the network. These keys will contain document id arrays, e.g.:
It then performs an intersection of these keys, and retrieves the documents from the DHT network. So for the above example, roughly:
So for this search, 4 requests are made to the DHT: cancer, research, 10.1039/cancer.research.1 and 10.1039/cancer.research.2 (in effect, there are more requests still, because each keyword gets queried for all fields of a document, so title, abstract, authors, journal, etc., in the form of "[fieldname]keyword" keys. With 5 fields per document, that's 10 requests for just two keywords, and then 2 more to get the actual documents.
If any of these fails, the whole search fails. I cache the document id lookups, as these are static, but even so, the failure rate for searches is quite high.
I guess document lookup could happen from dx.doi.org as a fallback, in case the ID is a DOI (not in all cases), but even so, there should be a way to make this more resilient, either by smartly partially failing or contacting replicas for keywords where main node can't be reached.