Open sushain97 opened 6 years ago
I was thinking Class Analyze and Class Generate in analysis/__init.py
and generation/__init.py
which has the function definitions presently in __init.py__
?
Why not the names I mentioned in the ticket?
Sure. The names you mentioned as classes then !
Yeah. Foo objects means a class named Foo.
@sushain97
Two implementation options:
g = Generator(in_text, lang)
g.generate(formatting='txt')
or
g = Generator()
g.generate(in_text, lang, formatting)
Also, All of this code to be pushed in the same PR? Because It has a lot of different issues that are being worked on simultaneously
Also,
This is how the modules look now
class Analyzer:
def __init__(self):
self.analyzer_cmds = {} # type: Dict[str, List[List[str]]]
def _get_commands(self, lang): # type: (str) -> List[List[str]]
if lang not in self.analyzer_cmds:
mode_path, mode = apertium.analyzers[lang]
self.analyzer_cmds[lang] = parse_mode_file(mode_path+'/modes/'+mode+'.mode')
return self.analyzer_cmds[lang]
def _postproc_text(self, result): # type: (str) -> List[LexicalUnit]
"""
postprocesses the input
"""
lexical_units = list(parse(result))
return lexical_units
def analyze(self, lang, in_text, formatting='txt'): # type: (str, str, str) -> List[LexicalUnit]
"""
runs apertium to analyze the input
"""
lang = to_alpha3_code(lang)
if lang in apertium.analyzers:
commands = list(self._get_commands(lang))
result = execute(in_text, commands)
return self._postproc_text(result)
else:
raise apertium.ModeNotInstalled(lang)
The second option is silly and doesn't serve a purpose besides forcing an object. Note that we should also preserve the top level generate
function and have it create a Generator
and then call it.
Also, All of this code to be pushed in the same PR?
It's fine as a separate PR. Probably better, honestly. The changes for this particularly issue aren't too large so I'm happy to review today.
IMHO, the first option is even sillier, as you get an object that you can barely reuse.
I think the generator could get as part of the constructor language and/or format, and get the text as part of the generate method. By doing that, you could for example use the same object across multiple "generation requests".
What do you think?
@xavivars
Yes, makes sense :) We can use a generator object for a particular language across multiple requests. Sounds a lot more cleaner
I'm not very sure about getting the format though @xavivars? Changed formats over multiple requests for the same language data seems very desirable, don't you think?
Oh, I completely misread the first option and thought it was exactly what @xavivars wants. My bad.
@arghyatiger if you check our email chain, I proposed the APIs for all this stuff back when we started (and it's exactly what @xavivars wants and what I thought you said).
I think the formatting argument can stay with the method call for now.
Oops, wrong button.
Copied since I don't think Xavi has seen it:
import apertium apertium.translate('eng', 'spa', 'hello') hola
(also accepting 2-letter codes, of course)
or equivalently:
import apertium eng_spa = apertium.translator('eng', 'spa') eng_spa.run('hello') hola eng_spa.translate('hello') hola
or even more generally:
import apertium eng_spa = apertium.translator('eng', 'spa') eng_spa_mode = apertium.mode('eng-spa') eng_spa == eng_spa_mode True
Underneath we need to support multiple implementations, one via SWIG but that's for later (as discussed). The constructor of the object will be able to read in some of the data files so it does serve a purpose. Bare function calls would have to load the data in (unless we cache it and I don't like that much because it would unexpectedly use lots of memory).
@sushain97 I'll take a look at them now.
also, it seems I need some git help :p
I've added the changes to this issue with the translation change in the translation branch. Is it okay if I keep working on this branch? (I'd really love to change, but can't figure out a smooth way of doing it) Making a new branch, pulling master and making the same changes there again seems like an overkill
Use git cherry pick.
On Sun, Jun 10, 2018, 2:37 AM Arghya Bhatttacharya notifications@github.com wrote:
@sushain97 https://github.com/sushain97 I'll take a look at them now.
also, it seems I need some git help :p
I've added the changes to this issue with the translation change in the translation branch. Is it okay if I keep working on this branch? (I'd really love to change, but can't figure out a smooth way of doing it) Making a new branch, pulling master and making the same changes there again seems like an overkill
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apertium/apertium-python/issues/17#issuecomment-396034957, or mute the thread https://github.com/notifications/unsubscribe-auth/AEBEfnoVm25l3iMvxVh3dHlM9iCEiVQ2ks5t7OjggaJpZM4UXg6v .
@arghyatiger what do apertium.analyzers
, apertium.generators
and apertium.translators
look like now? I think those should be part of the public API and have a list of objects that can translate.
@sushain97 All modules have access options from apertium and through making respective objects.
That doesn't answer my question :)
Also, I should be able to do Mode('en-morph') and get an Analzyer.
@sushain97 Public API as in? the main branch? Also, I'm not sure what you mean by having a list of objects that can translate?
The public API is the part that's documented and not munged. Things that are not part of the public API ought to have a leading underscore.
a list of objects that can translate
A list of Translator
objects. This isn't really a super important ask right now, though.
apertium.analyzers
andapertium.generators
need to haveAnalyzer
andGenerator
objects that I can runanalyze
andgenerate
on.