ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 208 forks source link

StemTransitionsMapBased.getMultiStems is absent #235

Closed ghost closed 4 years ago

ghost commented 4 years ago

Is there a special reason for not having a public getter of multiStems field? If i add one to project can you merge it? I need it because I try to extract graph of states from Zemberek to make Zemberek be able to be used with languages other than Java.

mdakin commented 4 years ago

@temizerm
I see, so you want to export the graph including stems in to a format that can be read from other languages, should be possible. as far as I remember StemTransitionsMapBased is a bit hacky, we started with a specific trie based one (in the same directory) but I never implemented the trie walk with asciified version of nodes. anyway, I don't think there is a specific reason that getMultiStems does not have a public getter, probably because we did not need to expose it before. Feel free to send a request. @ahmet-aaa wdyt?

ahmetaa commented 4 years ago

@temizerm as @mdakin said there was no special reason for that. So you can expose it. However, I would caution extracting Zemberek's internal graph mechanism to an intermediate format or such. Even though it was our initial intention, great refactoring of spring 2018 made it quite an unattainable task IMO. Current mechanism contains conditions and some hacks in it. It may become a very hard task.

mdakin commented 4 years ago

@ahmetaa It is probably still possible and could be interesting, it would be a big problem if zemberek morphology package was still changing wildly so that exporting logic would be broken, but that is not the case. So if @temizerm wants to explore it, I encourage the project.

ghost commented 4 years ago

I explore most of the TurkishMorphotactics class and found out that it is written using design patterns like builder pattern, it makes the code very readable and relatively easy to understand what is going on.

I also have suggestions like moving the StemTransitionsMapBased class out of the analysis package, because I think, I should not need to run any code which is in analysis package, while generating the graph to be exported.

@mdakin thanks for support and @ahmetaa thanks for attention. I am going to expose multiStems with a public getter first.

ghost commented 4 years ago

It is decided not to make a pull request, instead Reflection API is used when it is necessary. The progress of the graph exporting project is here: progress Generating sub-classes was a way to export MorphemeStates but I had to use Reflection API to get MorphemeStates which are defined with default accessbility in Java. Thanks for encouragement again, @mdakin , @ahmetaa . Also it would be appreciated if it is permitted to use Reflection API.

ghost commented 4 years ago

I am closing the issue because i exported a graph using Reflection API. There are an example of how to use Reflection API is on morphotactics-graph repository I am going to add new issue about considerations.