apertium / apertium-python

now you can even use apertium from python
GNU General Public License v3.0
31 stars 27 forks source link

Better mode introspection #17

Open sushain97 opened 6 years ago

sushain97 commented 6 years ago

apertium.analyzers and apertium.generators need to have Analyzer and Generator objects that I can run analyze and generate on.

orgh0 commented 6 years ago

I was thinking Class Analyze and Class Generate in analysis/__init.py and generation/__init.py which has the function definitions presently in __init.py__ ?

sushain97 commented 6 years ago

Why not the names I mentioned in the ticket?

orgh0 commented 6 years ago

Sure. The names you mentioned as classes then !

sushain97 commented 6 years ago

Yeah. Foo objects means a class named Foo.

orgh0 commented 6 years ago

@sushain97

Two implementation options:

g = Generator(in_text, lang)
g.generate(formatting='txt')

or

g = Generator()
g.generate(in_text, lang, formatting)

Also, All of this code to be pushed in the same PR? Because It has a lot of different issues that are being worked on simultaneously

orgh0 commented 6 years ago

Also,

This is how the modules look now

class Analyzer:

    def __init__(self):
        self.analyzer_cmds = {}  # type: Dict[str, List[List[str]]]

    def _get_commands(self, lang):  # type: (str) -> List[List[str]]
        if lang not in self.analyzer_cmds:
            mode_path, mode = apertium.analyzers[lang]
            self.analyzer_cmds[lang] = parse_mode_file(mode_path+'/modes/'+mode+'.mode')
        return self.analyzer_cmds[lang]

    def _postproc_text(self, result):  # type: (str) -> List[LexicalUnit]
        """
        postprocesses the input
        """
        lexical_units = list(parse(result))
        return lexical_units

    def analyze(self, lang, in_text, formatting='txt'):  # type: (str, str, str) -> List[LexicalUnit]
        """
        runs apertium to analyze the input
        """
        lang = to_alpha3_code(lang)

        if lang in apertium.analyzers:
            commands = list(self._get_commands(lang))
            result = execute(in_text, commands)
            return self._postproc_text(result)
        else:
            raise apertium.ModeNotInstalled(lang)
sushain97 commented 6 years ago

The second option is silly and doesn't serve a purpose besides forcing an object. Note that we should also preserve the top level generate function and have it create a Generator and then call it.

Also, All of this code to be pushed in the same PR?

It's fine as a separate PR. Probably better, honestly. The changes for this particularly issue aren't too large so I'm happy to review today.

xavivars commented 6 years ago

IMHO, the first option is even sillier, as you get an object that you can barely reuse.

I think the generator could get as part of the constructor language and/or format, and get the text as part of the generate method. By doing that, you could for example use the same object across multiple "generation requests".

What do you think?

orgh0 commented 6 years ago

@xavivars

Yes, makes sense :) We can use a generator object for a particular language across multiple requests. Sounds a lot more cleaner

orgh0 commented 6 years ago

I'm not very sure about getting the format though @xavivars? Changed formats over multiple requests for the same language data seems very desirable, don't you think?

sushain97 commented 6 years ago

Oh, I completely misread the first option and thought it was exactly what @xavivars wants. My bad.

@arghyatiger if you check our email chain, I proposed the APIs for all this stuff back when we started (and it's exactly what @xavivars wants and what I thought you said).

I think the formatting argument can stay with the method call for now.

sushain97 commented 6 years ago

Oops, wrong button.

sushain97 commented 6 years ago

Copied since I don't think Xavi has seen it:

import apertium apertium.translate('eng', 'spa', 'hello') hola

(also accepting 2-letter codes, of course)

or equivalently:

import apertium eng_spa = apertium.translator('eng', 'spa') eng_spa.run('hello') hola eng_spa.translate('hello') hola

or even more generally:

import apertium eng_spa = apertium.translator('eng', 'spa') eng_spa_mode = apertium.mode('eng-spa') eng_spa == eng_spa_mode True

Underneath we need to support multiple implementations, one via SWIG but that's for later (as discussed). The constructor of the object will be able to read in some of the data files so it does serve a purpose. Bare function calls would have to load the data in (unless we cache it and I don't like that much because it would unexpectedly use lots of memory).

orgh0 commented 6 years ago

@sushain97 I'll take a look at them now.

also, it seems I need some git help :p

I've added the changes to this issue with the translation change in the translation branch. Is it okay if I keep working on this branch? (I'd really love to change, but can't figure out a smooth way of doing it) Making a new branch, pulling master and making the same changes there again seems like an overkill

sushain97 commented 6 years ago

Use git cherry pick.

On Sun, Jun 10, 2018, 2:37 AM Arghya Bhatttacharya notifications@github.com wrote:

@sushain97 https://github.com/sushain97 I'll take a look at them now.

also, it seems I need some git help :p

I've added the changes to this issue with the translation change in the translation branch. Is it okay if I keep working on this branch? (I'd really love to change, but can't figure out a smooth way of doing it) Making a new branch, pulling master and making the same changes there again seems like an overkill

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apertium/apertium-python/issues/17#issuecomment-396034957, or mute the thread https://github.com/notifications/unsubscribe-auth/AEBEfnoVm25l3iMvxVh3dHlM9iCEiVQ2ks5t7OjggaJpZM4UXg6v .

sushain97 commented 6 years ago

@arghyatiger what do apertium.analyzers, apertium.generators and apertium.translators look like now? I think those should be part of the public API and have a list of objects that can translate.

orgh0 commented 6 years ago

@sushain97 All modules have access options from apertium and through making respective objects.

sushain97 commented 6 years ago

That doesn't answer my question :)

Also, I should be able to do Mode('en-morph') and get an Analzyer.

orgh0 commented 6 years ago

@sushain97 Public API as in? the main branch? Also, I'm not sure what you mean by having a list of objects that can translate?

sushain97 commented 6 years ago

The public API is the part that's documented and not munged. Things that are not part of the public API ought to have a leading underscore.

a list of objects that can translate

A list of Translator objects. This isn't really a super important ask right now, though.