Closed mthuurne closed 2 years ago
Maybe I have unrealistic expectations of the model, expecting it to deliver similar words, when it is only intended to compare messages? Feel free to close this issue if the fused words are not a problem for the model's intended use.
I presume that the behavior that you notice is an artifact of our preprocessing. It seems like all your example would have a slash in between them in the original training text: 'hij/zij', 'hem/haar', etc. I will try to fix this issue when building a new model and hopefully better handle these cases. Thanks for the report!
Another Semantle list. This time, the problem is that words that I expect were in the original text as alternatives ("hij/zij" etc.) were fused into single words:
I verified using the demo from the README that these fused words indeed occur in the model; it's not an artifact of Semantle's code.