Closed MichaelRoeder closed 6 years ago
The first attempt of solving the problem seems to work for small examples. However, its complexity is too high (O (n*m) where n and m are the lengths of the original and the new text).
Further changes commited in https://github.com/dice-group/gerbil/commit/14b4ebf6304877bc5919724b28a6831d288a5ca0 Another adaptation that should be implemented is described as "We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time."
Fixed with https://github.com/dice-group/gerbil/commit/566763685d6dcb5bcee1879ba4d8597c8abf8387
@TortugaAttack FYI: with the line at https://github.com/dice-group/gerbil/blob/hotfix-issue208/src/main/java/org/aksw/gerbil/annotator/impl/fox/FOXAnnotator.java#L93 changes to the text made by the annotator can be revoked while keeping the correct NE position.
Problem
The FOX annotator changes the input text and adds the annotations to the cleaned text. An example from ACE2004 (with a leading whitespace) and directly below the text returned by FOX
Since GERBIL is build on the assumption, that the annotators are using the given text, the results of FOX might be missleading (using STRONG_ANNOTATION_MATCH).
Solution
The adapter of FOX should change the annotations positions by comparing the received document with the original document.
@renespeck Please list changes that FOX can apply to a document. Is it only removing unnecessary whitespace characters? What about linebreaks?
Check
A small class that goes through all registered annotators (for ERec) and uses an example document like the one above to check whether the same document is returned by the services. Problem: the ERec method does not support this check. @TortugaAttack any ideas how to do that in an easy way? Or does it even make sense to have such a class?