Closed Fati-Hei closed 3 years ago
But as far as I can see, this doesn't seem to be a problem with skweak, but with the functions standards_detector
and st_detector
that you implemented.
For instance, the st_detector
function relies on having separate tokens for the "NS-EN" and the numbers that come after it -- which means it won't work on phrases such as " NS-EN12845". And your function is also limited to handling two tokens (since you only check whether the current token starts with a digit), so it's not suprising it doesn't recognise the full phrase NS-EN 12845 2020
.
Well it's simply that the loop you have written in your function does not properly handle two consecutive tokens with numerical values.
Could you give a concrete example with some simple sentences we could try out? The code you provide looks a priori fine (apart from the fact you also need to create a
FunctionAnnotator
foryour st_detector
function and run it on your documents.