Closed sushain97 closed 6 years ago
For packaging purposes, PyPi must be optional. Everything has to work with whatever dependencies can be had from apt-get on the target distros (currently all supported and upcoming editions of Debian and Ubuntu).
@TinoDidriksen: what's the benefit of packaging it as a deb/rpm if it can be just installed as a python package than can be used in any python virtual environment in an easy way? (I'm asking because I'm sure there are benefits that I'm not aware of)
Ping @kartikm - does Wikimedia critically want the Debian package for APy, or can you live with PyPi?
Everything has to work with whatever dependencies can be had from apt-get on the target distros (currently all supported and upcoming editions of Debian and Ubuntu).
Presumably you can apt-get specific versions of deps, yes? For example, apt-get Tornado 4.5.x instead of 5.x.
If streamparser is installable via pip, what would we want to make it optional? Wouldn't it be easier to just include it with the other dependencies and have it install at the same time than, for example, tornado? If someone doesn't have internet access and can't do a pip install streamparser, they won't be able to do pip install requirements.txt either...
Well, I think the philosophy is to make the only hard requirement Tornado. That's why chardet
and cld2
are also soft dependencies that will simply disable functionality if not installed.
Regarding requirements.txt... If the goal is to have APy itself installable via pip, wouldn't it be better to use something like setuptools to make the different dependencies explicit (runtime, development/test) in the setup.py so the package that gets published automatically retrieves all dependencies?
Indeed. That shall be done in the setup.py
:) Right now, requirements.txt is the union of all meaningful deps.
In any case, I personally like a lot the refactor (I'll look into it with more deeply to try to provide more useful feedback), and I'd even go further (moving all Handlers to an apy.handlers module, etc). In any case, great improvement!
Thanks! :) I agree, some even more in depth code reorganization would be nice. That will come after the big changes are taken care of (mostly done).
One of the best things about apy is that it doesn't require all the python setup.py
stuff. The original idea was to have a simple standalone server (servlet.py
), I appreciate some kind of feature creep and I think that apy is a nice piece of software, but would like to avoid the typical python installation bloat.
Incidentally would it be possible to do whatever you want regarding build/configuration with autotools ?
There will be no functional change for backwards compatibility. If you don't want to use standard installation steps, you can clone the repo and run servlet.py
. The current state of this code is really awful and it makes me not want to work on fixing bugs or improving it, hence this massive refactor. There are also lots of bugs and undocumented behavior that a refactor like this uncovers...
Incidentally would it be possible to do whatever you want regarding build/configuration with autotools ?
There isn't any new build or configuration.
The original idea was to have a simple standalone server (servlet.py),
Simple is small and fine if we don't care about things like performance or stability.
$ cat apertium_apy/*.py | wc -l
3134
a ton of that isn't just feature creep but also required for holding pipelines open, locking them properly, dropping them when required, restarting as required, etc. And it's insanely annoying right now to find where you want to be in the code... :(
Presumably you can apt-get specific versions of deps, yes?
Generally no. It is very rare for there to be multiple versions of a package - it is actively discouraged, in fact.
E.g., python3-tornado
only exists as a single version per supported distro - which means the required supported versions would be 3.1, 3.2, 4.2, 4.4, 4.5, and 5.0.
@ftyers "python installation bloat" (single file setup.py
so then anyone can do pip install apertium-apy
) and you want to use autotools :stuck_out_tongue_closed_eyes: I never really ~liked~ understood autotools.
Now seriously, I agree APy was at the very beginning a single servlet.py
file (even more when compared to the "monster" ScaleMT was), but thats no longer true. The complexity of the tool has increased, slowly but consistently, in the past few years, with new features being added (multiple handlers, new features support things like chained translation, db storing of unknown words,...) and reordering the code base doesn't per se make it more complex, but simply better structured. If it has too many files/classes/features, that's something orthogonal to how the code is structured. But just keeping the code how it is is just lying to ourselves saying that the code is still extremely simple, when it's not.
Well, I think the philosophy is to make the only hard requirement Tornado. That's why chardet and cld2 are also soft dependencies that will simply disable functionality if not installed.
So is you current idea to have a "requirements" (or in setup) that includes chardet and cld2 too, but then when someone just "cloning" the repo being able to use it with the only dependencie of tornado?
Also, something else that could be done is to add the "non-required" dependencies into an extra
So is you current idea to have a "requirements" (or in setup) that includes chardet and cld2 too, but then when someone just "cloning" the repo being able to use it with the only dependencie of tornado?
Indeed. Check out how this PR has 'two' requirements.txt
s.
Also, something else that could be done is to add the "non-required" dependencies into an extra
Yep, that would be perfect.
Ah, we've actually been advertising 3.4 compat despite having broken it! TravisCI is lying to us since installing mypy
on 3.4 brings in the 3.4 backport of the typing
module. cc @unhammer
edit: fixed in 4b74075
Ok, almost all the todo are complete. This diff is already really massive so before more fine grained refactoring in the code, I'd like to merge this into master.
Thoughts? (would appreciate a :+1: if you have no objections)
$ pip install apertium-apy
$ apertium-apy
will now Just Work to run APy (if you already have Apertium, of course).
Caching /tmp/languages
has improved build time by ~30% \o/
OK, I think everything I originally planned for this particular PR is now complete. Will merge tomorrow afternoon/evening after any concerns raised are resolved.
Seems to work just fine for me without having to use virtualenv/pip etc. for anything; this looks great =D
Awesome! This has been merged into master, tagged with v0.11.0.rc1
and released to PyPi.
Issues have been created for the remaining 4 items and I will release v0.11.0
soon (day-scale) if there are no problems reported.
@TinoDidriksen Pypi isn't good for Debian packages or WMF repository. Same as, https://github.com/apertium/apertium-apy/pull/91#issuecomment-375860679
I find it is in general a bad idea to have hardcoded "ceiling" version requirements. Pip, Virtualenv, Docker all seem to encourage this by saying "hey, who cares if there's several separate versions of this library installed? They can all coexist, and even if some are old enough to have serious security issues, they're running isolated!".
But Virtualenv isn't meant to provide security, and Docker provides it to some extent thanks to the kernel infrastructure it employs, but it's not its primary goal, nor is it sensible in the long run to make software insecure in the knowledge it will (perhaps?) be run under an isolated sandbox.
So in my opinion, maintained software meant for servers should generally keep doing what it has always done: interface correctly with all the versions of dependencies it requires from the latest to the oldest one that is still maintained, distributed and not shown to be critically insecure. This allows packaging it for major distributions of Unix, as well as using the PyPi infrastructure... just, not as a shortcut to depend on specific (and quite possibly non-ideal) versions of things.
I find it is in general a bad idea to have hardcoded "ceiling" version requirements.
To my knowledge, APy behaves very badly on Tornado 5.0 so the setup.py
doesn't ask for it here. When that issue is fixed, the ceiling will be lifted. @TinoDidriksen correct me if I'm wrong regarding its behavior.
They can all coexist, and even if some are old enough to have serious security issues, they're running isolated!".
Well, as far as APy is concerned, Tornado 4.X is still well supported and maintained so there shouldn't be any security issues that could be fixed by lifting the current ceiling. As far as I can tell, there's no dependence on non-ideal versions; the ceiling is the latest version of 4.5 (released less than 2 months ago).
So in my opinion, maintained software meant for servers should generally keep doing what it has always done: interface correctly with all the versions of dependencies it requires from the latest to the oldest one that is still maintained, distributed and not shown to be critically insecure.
I don't oppose a PR that makes APy compatible with Tornado version X as long as it doesn't clutter the code significantly and someone actually uses APy with Tornado version X. The last release to Tornado v3 was almost 4 years ago so I think if someone wants to use APy with it, fine. They can use APy v0.10.0
:)
APy doesn't run at all with Tornado 5.0, which I guess counts as behaving badly.
Continue this discussion in ticket https://github.com/apertium/apertium-apy/issues/99
Some of these are wishful thinking but most of them are possible in one fell swoop. The ones that are not will be converted to issues upon merge of this PR. This will, of course, require a minor version bump.
to do (feel free to add):
apertium-apy
script in packagecreate issues for (if not already exists), ticked when issue created:
apy.py
into folders/multiple files (#95)