dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

Tagme #144

Open chabmed opened 8 years ago

chabmed commented 8 years ago

Hi, Tagme does not run on GERBIL, i don't know if you have been informed that the url of their web service has been changed as well as the key to run it. Thanks

RicardoUsbeck commented 8 years ago

Hi, we are aware of the change of the API of TAGME (requiring a new API key). We will investigate and solve as soon as possible (probably next week).

Greetings Ricardo

MichaelRoeder commented 8 years ago

Hi, the new API is implemented and we are using a new API key. To use the new API we have to use HTTPS to connect to the Tagme service and the CA certificate is not in our standard set of certificates. I fixed this for the Tagme certificate in June. However, the certificate has expired and I might have to fix it again.

Cheers Michael

chabmed commented 8 years ago

Hi, thanks for your reply. But i have a question about the how GERBIL calculate the precision and recall for A2KB, As an example for the mention "United States Census Bureau", tagme return two annotations, one for "united states " and another for "Census Bureau", these cases are correctly evaluated for the entity linking subtask, but for the A2KB task how GERBIL calculate it, because i noticed that the recall for A2KB can be lower than the recall for Entity Linking? Thanks Mohamed

MichaelRoeder commented 8 years ago

Hi,

first of all, TagMe 2 seems to work on my local machine. I thought I could reproduce the certifcate issue described above but it does not seem to be a problem. Thus, I will check why it shouldn't be working on the server as soon as there are not that many experiments waiting in the queue.

Ricardo created an example based on your description (@Ricardo: thanks :smile:): The United States Census Bureau defines four statistical regions, with nine divisions.

TagMe 2 returned 4 annotations

4, 27, [http://dbpedia.org/resource/United_States_Census_Bureau]
45, 19, [http://dbpedia.org/resource/Statistical_regions_of_Slovenia], 0.03846
71, 4, [http://dbpedia.org/resource/Ninth_grade], 0.04781
76, 9, [http://dbpedia.org/resource/Division_(country_subdivision)], 0.0258

The last three annotations are false positives. However, they are sorted out because GERBIL opimizes the confidence score threshold regarding the F1-score. Thus, TagMe 2 gets an F1-score of 1.0 in this example.

However, if it would mark United States and Census Bureau this would result in at least one false positive. If Census Bureau has the correct URI it is identified as correct if the weak annotation match is used. Based on the strong annotation match, both annotations would be false positives and the complete annotation would be counted as false negative.

Can you give an example for the recall difference between D2KB and A2KB? Which matching do you used for A2KB?

chabmed commented 8 years ago

Hi, i'm using the strong annotation and the recall for A2KB is less than the recall for entity linking on AIDA-ConLL and DBpedia Spotlight datatsets thanks Mohamed

RicardoUsbeck commented 8 years ago

What is the status here?

chabmed commented 8 years ago

I did not receive an answer for my question. It's the same case using Babelfy, the recall for the A2KB is less than the ones for entity linking, testing on KORE50 for example (strong annotation test). thanks

MichaelRoeder commented 8 years ago

Hi,

sry, for some reason I lost track of this issue and missed to investigate the problem :worried:

Can you give please link two recent experiments where you saw the difference (one A2KB and one D2KB)?

Cheers, Micha

chabmed commented 8 years ago

Hi, this is the example of testing Babelfy on KORE50 http://gerbil.aksw.org/gerbil/experiment?id=201611040019 the micro recall for A2KB is 0.5556 while the recall for the subtask D2KB is 0.5625 but they should be the same i think thanks