apertium / apertium-apy

📦 Apertium HTTP Server in Python
https://wiki.apertium.org/wiki/Apertium-apy
GNU General Public License v3.0
32 stars 42 forks source link

Major refactoring #91

Closed sushain97 closed 6 years ago

sushain97 commented 6 years ago

Some of these are wishful thinking but most of them are possible in one fell swoop. The ones that are not will be converted to issues upon merge of this PR. This will, of course, require a minor version bump.

to do (feel free to add):

create issues for (if not already exists), ticked when issue created:

TinoDidriksen commented 6 years ago

For packaging purposes, PyPi must be optional. Everything has to work with whatever dependencies can be had from apt-get on the target distros (currently all supported and upcoming editions of Debian and Ubuntu).

xavivars commented 6 years ago

@TinoDidriksen: what's the benefit of packaging it as a deb/rpm if it can be just installed as a python package than can be used in any python virtual environment in an easy way? (I'm asking because I'm sure there are benefits that I'm not aware of)

TinoDidriksen commented 6 years ago

Ping @kartikm - does Wikimedia critically want the Debian package for APy, or can you live with PyPi?

sushain97 commented 6 years ago

Everything has to work with whatever dependencies can be had from apt-get on the target distros (currently all supported and upcoming editions of Debian and Ubuntu).

Presumably you can apt-get specific versions of deps, yes? For example, apt-get Tornado 4.5.x instead of 5.x.

sushain97 commented 6 years ago

If streamparser is installable via pip, what would we want to make it optional? Wouldn't it be easier to just include it with the other dependencies and have it install at the same time than, for example, tornado? If someone doesn't have internet access and can't do a pip install streamparser, they won't be able to do pip install requirements.txt either...

Well, I think the philosophy is to make the only hard requirement Tornado. That's why chardet and cld2 are also soft dependencies that will simply disable functionality if not installed.

sushain97 commented 6 years ago

Regarding requirements.txt... If the goal is to have APy itself installable via pip, wouldn't it be better to use something like setuptools to make the different dependencies explicit (runtime, development/test) in the setup.py so the package that gets published automatically retrieves all dependencies?

Indeed. That shall be done in the setup.py :) Right now, requirements.txt is the union of all meaningful deps.

sushain97 commented 6 years ago

In any case, I personally like a lot the refactor (I'll look into it with more deeply to try to provide more useful feedback), and I'd even go further (moving all Handlers to an apy.handlers module, etc). In any case, great improvement!

Thanks! :) I agree, some even more in depth code reorganization would be nice. That will come after the big changes are taken care of (mostly done).

ftyers commented 6 years ago

One of the best things about apy is that it doesn't require all the python setup.py stuff. The original idea was to have a simple standalone server (servlet.py), I appreciate some kind of feature creep and I think that apy is a nice piece of software, but would like to avoid the typical python installation bloat.

Incidentally would it be possible to do whatever you want regarding build/configuration with autotools ?

sushain97 commented 6 years ago

There will be no functional change for backwards compatibility. If you don't want to use standard installation steps, you can clone the repo and run servlet.py. The current state of this code is really awful and it makes me not want to work on fixing bugs or improving it, hence this massive refactor. There are also lots of bugs and undocumented behavior that a refactor like this uncovers...

Incidentally would it be possible to do whatever you want regarding build/configuration with autotools ?

There isn't any new build or configuration.

sushain97 commented 6 years ago

The original idea was to have a simple standalone server (servlet.py),

Simple is small and fine if we don't care about things like performance or stability.

$ cat apertium_apy/*.py | wc -l
    3134

a ton of that isn't just feature creep but also required for holding pipelines open, locking them properly, dropping them when required, restarting as required, etc. And it's insanely annoying right now to find where you want to be in the code... :(

TinoDidriksen commented 6 years ago

Presumably you can apt-get specific versions of deps, yes?

Generally no. It is very rare for there to be multiple versions of a package - it is actively discouraged, in fact.

E.g., python3-tornado only exists as a single version per supported distro - which means the required supported versions would be 3.1, 3.2, 4.2, 4.4, 4.5, and 5.0.

xavivars commented 6 years ago

@ftyers "python installation bloat" (single file setup.py so then anyone can do pip install apertium-apy) and you want to use autotools :stuck_out_tongue_closed_eyes: I never really ~liked~ understood autotools.

Now seriously, I agree APy was at the very beginning a single servlet.py file (even more when compared to the "monster" ScaleMT was), but thats no longer true. The complexity of the tool has increased, slowly but consistently, in the past few years, with new features being added (multiple handlers, new features support things like chained translation, db storing of unknown words,...) and reordering the code base doesn't per se make it more complex, but simply better structured. If it has too many files/classes/features, that's something orthogonal to how the code is structured. But just keeping the code how it is is just lying to ourselves saying that the code is still extremely simple, when it's not.

Well, I think the philosophy is to make the only hard requirement Tornado. That's why chardet and cld2 are also soft dependencies that will simply disable functionality if not installed.

So is you current idea to have a "requirements" (or in setup) that includes chardet and cld2 too, but then when someone just "cloning" the repo being able to use it with the only dependencie of tornado?

Also, something else that could be done is to add the "non-required" dependencies into an extra

sushain97 commented 6 years ago

So is you current idea to have a "requirements" (or in setup) that includes chardet and cld2 too, but then when someone just "cloning" the repo being able to use it with the only dependencie of tornado?

Indeed. Check out how this PR has 'two' requirements.txts.

Also, something else that could be done is to add the "non-required" dependencies into an extra

Yep, that would be perfect.

sushain97 commented 6 years ago

Ah, we've actually been advertising 3.4 compat despite having broken it! TravisCI is lying to us since installing mypy on 3.4 brings in the 3.4 backport of the typing module. cc @unhammer

edit: fixed in 4b74075

sushain97 commented 6 years ago

Ok, almost all the todo are complete. This diff is already really massive so before more fine grained refactoring in the code, I'd like to merge this into master.

Thoughts? (would appreciate a :+1: if you have no objections)

sushain97 commented 6 years ago
$ pip install apertium-apy
$ apertium-apy

will now Just Work to run APy (if you already have Apertium, of course).

sushain97 commented 6 years ago

Caching /tmp/languages has improved build time by ~30% \o/

sushain97 commented 6 years ago

OK, I think everything I originally planned for this particular PR is now complete. Will merge tomorrow afternoon/evening after any concerns raised are resolved.

unhammer commented 6 years ago

Seems to work just fine for me without having to use virtualenv/pip etc. for anything; this looks great =D

sushain97 commented 6 years ago

Awesome! This has been merged into master, tagged with v0.11.0.rc1 and released to PyPi.

Issues have been created for the remaining 4 items and I will release v0.11.0 soon (day-scale) if there are no problems reported.

kartikm commented 6 years ago

@TinoDidriksen Pypi isn't good for Debian packages or WMF repository. Same as, https://github.com/apertium/apertium-apy/pull/91#issuecomment-375860679

LuccoJ commented 6 years ago

I find it is in general a bad idea to have hardcoded "ceiling" version requirements. Pip, Virtualenv, Docker all seem to encourage this by saying "hey, who cares if there's several separate versions of this library installed? They can all coexist, and even if some are old enough to have serious security issues, they're running isolated!".

But Virtualenv isn't meant to provide security, and Docker provides it to some extent thanks to the kernel infrastructure it employs, but it's not its primary goal, nor is it sensible in the long run to make software insecure in the knowledge it will (perhaps?) be run under an isolated sandbox.

So in my opinion, maintained software meant for servers should generally keep doing what it has always done: interface correctly with all the versions of dependencies it requires from the latest to the oldest one that is still maintained, distributed and not shown to be critically insecure. This allows packaging it for major distributions of Unix, as well as using the PyPi infrastructure... just, not as a shortcut to depend on specific (and quite possibly non-ideal) versions of things.

sushain97 commented 6 years ago

I find it is in general a bad idea to have hardcoded "ceiling" version requirements.

To my knowledge, APy behaves very badly on Tornado 5.0 so the setup.py doesn't ask for it here. When that issue is fixed, the ceiling will be lifted. @TinoDidriksen correct me if I'm wrong regarding its behavior.

They can all coexist, and even if some are old enough to have serious security issues, they're running isolated!".

Well, as far as APy is concerned, Tornado 4.X is still well supported and maintained so there shouldn't be any security issues that could be fixed by lifting the current ceiling. As far as I can tell, there's no dependence on non-ideal versions; the ceiling is the latest version of 4.5 (released less than 2 months ago).

So in my opinion, maintained software meant for servers should generally keep doing what it has always done: interface correctly with all the versions of dependencies it requires from the latest to the oldest one that is still maintained, distributed and not shown to be critically insecure.

I don't oppose a PR that makes APy compatible with Tornado version X as long as it doesn't clutter the code significantly and someone actually uses APy with Tornado version X. The last release to Tornado v3 was almost 4 years ago so I think if someone wants to use APy with it, fine. They can use APy v0.10.0 :)

TinoDidriksen commented 6 years ago

APy doesn't run at all with Tornado 5.0, which I guess counts as behaving badly.

Continue this discussion in ticket https://github.com/apertium/apertium-apy/issues/99