Added pandas to environment to generate eiopa_register.ttl
Moved notebook for generation to single python script
Name objects added to identifyingName are modified using the strip_item() function
strip_item has been adapted to only erase brackets, square brackets and commas instead of deleting the content inside brackets and behind a comma.
only double quotes are erased from the names
@wjwillemse two questions:
[x] Is it okay that I moved the notebook to python script? I think it is easier to use when just re-generating the eiopa_register.ttl file
[x] Could you check for the comments, which I added in the commit? I had some questions about the code.
We can also discuss the two things above on Thursday or some other moment, when you were able to look at it.
The adding of strip_item function improved the accuracy by 0.05 to 0.94 for test set 1. It might be even more, when the models are re-trained using the corrected names in the training process.
@wjwillemse two questions:
The adding of strip_item function improved the accuracy by 0.05 to 0.94 for test set 1. It might be even more, when the models are re-trained using the corrected names in the training process.
Closes #45