fberrizbeitia / wikidataEnricher

Enriches documents with wikidata entities
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

dereferencing DBpedia JSON fails sometimes? #2

Open photomedia opened 3 years ago

photomedia commented 3 years ago

https://github.com/fberrizbeitia/wikidataEnricher/blob/4a3b8aae92253896acb9b542cd52903f388ba0ec/get-concepts.py#L40

This line fails to retrieve anything for this datapoint in DBpedia:

{'@name': 'A/B testing', '@offset': '1329', 'resource': {'@label': 'B testing', '@uri': 'B_testing', '@contextualScore': '0.5000000000000275', '@percentageOfSecondRank': '0.0', '@support': '112', '@priorScore': '5.243730292908641E-7', '@finalScore': '1.0', '@types': ''}}

It results in fetching http://dbpedia.org/data/B_testing.json which is an empty JSON string

You can test with following abstract: "We present a medical crowdsourcing visual analytics platform called C2A to visualize, classify and filter crowdsourced clinical data. More specifically, C2A is used to build consensus on a clinical diagnosis by visualizing crowd responses and filtering out anomalous activity. Crowdsourcing medical applications have recently shown promise where the non-expert users (the crowd) were able to achieve accuracy similar to the medical experts. This has the potential to reduce interpretation/reading time and possibly improve accuracy by building a consensus on the findings beforehand and letting the medical experts make the final diagnosis. In this paper, we focus on a virtual colonoscopy (VC) application with the clinical technicians as our target users, and the radiologists acting as consultants and classifying segments as benign or malignant. In particular, C2A is used to analyze and explore crowd responses on video segments, created from fly-throughs in the virtual colon. C2A provides several interactive visualization components to build crowd consensus on video segments, to detect anomalies in the crowd data and in the VC video segments, and finally, to improve the non-expert user's work quality and performance by A/B testing for the optimal crowdsourcing platform and application-specific parameters. Case studies and domain experts feedback demonstrate the effectiveness of our framework in improving workers' output quality, the potential to reduce the radiologists' interpretation time, and hence, the potential to improve the traditional clinical workflow by marking the majority of the video segments as benign based on the crowd consensus.",

photomedia commented 3 years ago

I believe this is a bug in DBPedia Spotlight. I have posted about it on their Forum here: https://forum.dbpedia.org/t/dbpedia-uri-incorrect-when-label-includes-a-character/1204