Open tom-de-smedt opened 10 years ago
Hi Tom,
I'm a graduate in computational linguistics and would like to contribute to Pattern. Can you be more explicit about how Pattern should support Python 3? That is, do you want to maintain two different branches in parallel, one for Python 2 and one for 3? Or do you want to have a single code base that works both with 2 and 3? In the latter case, a library such as six would be useful.
Let me know what you think.
Cheers, Peter
Hi Peter,
My goal would be to have a single code base that works with 2 and 3, but I have little experience with Python 3 so I don't know how feasible it is. In any case, the task is becoming more urgent so I will start looking into it more. I took a look at six
which seems very useful. It's MIT-licensed so it could be included in Pattern.
Any help is appreciated! Let me know what you think.
Best, Tom
:+1: on a single codebase.
I think the first stage is to add travis for testing (I looks like you're missing a requirements.txt
file, so I'm unsure what deps it's missing (?) ). Travis will really help with conversion (and ensuring it continues to work on multiple platforms).
requirements.txt
), fix them up if need be.Happy to help if you can pass a requirements.txt
.
Got through the first two steps outlined by @hayd in this fork (repo has a requirements.txt
and .travis.yml
).
Some of the tests need to be excluded.
from test.py
# pattern.db tests require a valid username and password for MySQL.
# pattern.web tests require a working internet connection
# and API license keys (see pattern.web.api.py) for Google and Yahoo API's.
Travis is just running python -m unittest discover -s test
right now.
Ran futurize on the codebase. Here are some preliminary findings:
from __future__ import
and from future import
statements). Now that we have pip
and virtualenv
does it make sense to unbundle these?from __future__ import unicode_literals
in several places (mostly for raw string literals), but this should probably be handled more carefully in the long term.web.json.encoder
with the standard library module? There was a section starting with the comment ## HACK: hand-optimized bytecode; turn globals into locals
that I wasn't sure how to deal with and had to comment out.I'm a bit new to python, so any feedback is appreciated. This is a beautiful library and I'd love to see it get the unicode love from python 3.
My 2cents:
from __future__
and from future
are perfectly fine.Not sure what to do about API keys, was wondering what other modules e.g. pandas did for those parts... IIRC there may be keys you can use for testing of clipped results...
Perhaps it makes sense to create a PR for this and comment there, then you can comment on specific bits of code :) ... first pass tests then make pretty
There's an "official" fork of Pattern with the specific aim of making it compatible with Python 3: https://github.com/pattern3
The wiki has some more information: https://github.com/pattern3/pattern/wiki
The compatibility update is supported by a grant from the Python Software Foundation. This money is to be divided among contributors. You can read the grant proposal here: http://www.clips.ua.ac.be/media/Pattern-3-grant-proposal.pdf
The fork is initiated by myself, Waylon Flinn and David Branner. Everyone (Peter & hayd?) is welcome to join as admin of the project. As admin, you'll be able to edit anything so feel free to take initiative! (we do encourage pull requests, so we can keep track of who did what)
Happy to help with this, however when I tried (and trying again just now) running the tests I get a load of exceptions (python 2.7). I suspect this is just initial set up on my machine...
What do I need installed / setup to run the test suite (locally)?
Assuming fresh python install (or env) the following is failing:
git clone ...
cd pattern
python setup.py install # this *ought* to install dependencies, but I don't think it does
nosetests # this should sniff out and run all the tests, and does.
See to the travis run in the above fork: https://travis-ci.org/pinleague/pattern/builds/32799385 (this is the kind of thing that's failing though that's a couple of months old).
Hi Andy,
My knowledge of Travis is zero, but different people including yourself have suggested it as a first step so I will examine it more closely. Looking at the output of the link you provided, these look like typical Python 2 vs 3 errors, e.g., using print stuff
instead of print(stuff)
and except Exception, e
instead of except Exception as e
. These are easy to fix, I previously used regular expressions to update them in the source code, but not yet in the unit tests. I'll look at updating the unit tests and push it to pattern3.
Best, Tom
@tom-de-smedt Lots of stuff to migrated to python 3 but this can really only done with confidence once tests pass (and at the moment I can't get them passing either locally or on travis on python 2.7!!!).
At the moment they (the python 2.7 tests) fail with errors from the bottom of this page: https://travis-ci.org/pinleague/pattern/jobs/32799386. Any ideas why?
Hi Tom,
as I wrote at the beginning of this year, I'm still interested in contributing to pattern. However, I have not started yet because I didn't really know where to start. But now there exists a concrete plan and I would like to be part of it. I haven't written Python code for more than a year now but it should be easy for me to get into it again (I wrote a lot of Python code during my studies and I like the language very much). Last but not least, I have been out of the computational linguistics area since I started my current job a year ago, but it would be great to deal with that stuff again.
Some things are not yet clear to me:
I cannot tell you yet which module I would prefer to work on. First, I need to take a look at the code again. I'm not sure though whether it's a good idea to have a lot of admins for the fork. Working with pull requests is much better anyway due to the reasons you mentioned.
This was partially my misunderstanding (!), just running nosetests
ran the abstract test methods, which fail (at least that's part of it). cleaning these classes is probably a good thing to do anyways (they are in an "interesting" style... e.g. IMO the suite functions should go), I've cleaned up a little...
I had to capture a few actual test failures and some HTTP403Forbidden and HTTP404NotFounds. There's also a couple of proper errors (in python 2), which for now I'm skipping those tests, but they really need looking at, I've labelled them FIXME in my branch (should I PR to pattern3 or here once passing?)...
As I said above, it worth making necessary that these tests pass reliably in python 2 before even attempting to migrate to python 3 (otherwise it's shooting in the dark). That said, I think the issues I've found (and labelled FIXME) are minor (or at least I'm hopeful that's the case if someone can look at them who understands the codebase!).
See https://github.com/hayd/pattern/compare/c5d9c2358...ce1fe8103ccb (and on travis https://travis-ci.org/hayd/pattern/builds/39245044, unfortunately not quite passing python 2.6 and 2.7, I may have to skip/fix a couple more? Some tests seem flaky - especially those that compare e.g. to 0.771!).
Note1: This allows the test suite to be run by simply calling nosetests
(or py.test
).
Note2: I'm skipping the mysql tests atm, but that's no biggie to fix just an install in the yml (our objective is for no tests to be skipped on travis), the others are more important, but I'm afraid I need a patterns expert to look at the FIXMEs!!
Just to clarify the objectives here:
To answer @pemistahl I don't think going fully py3 (and dropping support for py27 is (Edit: NOT) a good option for a library... for the next decade!). I would like to see a shared code base and drop support for python <= 2.5 (nearly every library is dropping python 2.5 support).
I'd really like to see pattern3 (once ready) merge upstream into pattern.
@hayd OK, I get your point. I'm okay with that. It just reminds me again of how unhappy I am about the Python 3.* transition in general across the Python community.
Another question @tom-de-smedt : If working with pull requests is the preferred way for contribution, then why did you create the pattern3 fork? Anyone who wants to contribute would create their own fork anyway. Wouldn't it be sufficient to simply create a branch here in the main repo for this purpose?
I've submitted a couple of PRs to the pattern3 branch, I think it makes sense to fix that up then merge back here (it's going to be easier to keep track of things if they are in separate repos, separate issues/PRs etc). I would strongly recommend downing-tools for a short-while (here on clips/pattern) - hopefully for only for a few weeks, and concentrate on the pattern3 branch/repo.
I'm "somewhat hopeful" it's not a massive job (famous last words). Once the python3 imports are working it should be clearer where the hit list is going to be (I suspect the toughest are the str/bytes handling).
Just to update those following at home, last night I got python 3 running all tests without syntax or import errors (of course, half those tests are failing), python 2 is still passing all the tests (except those tests which failed before migration which are skipped).
https://github.com/pattern3/pattern/pull/6
(It did require ripping out the bundled (vendorized) packages and making them dependancies - I think this is a good idea anyway... so, more "home-testing" in python 2 may be a good idea before this update is merged back clip/pattern? esp. where there is poor coverage.)
This means there is a more obvious hitlist of things to do. For those who want to help I recommend (once this is merged), attempting to make all the tests pass on specific testing files you're interested in (e.g. for database):
$ nosetests test/test_db.py
$ nosetests test/test_db.py:TestClass
$ nosetests test/test_db.py:TestClass.test_method
$ nosetests test/test_db.py --pdb --pdb-fail # drop in when there's a failure/exception
A more complete todo list issue: https://github.com/pattern3/pattern/issues/5
I haven't really thought about how six fits here, IMO if it makes fixing a test easier then use it ?
Hello,
I'm looking forward to use Pattern with Python 3, because my work is written in it. I'm kind of confused with current state of Python 3 support. This package is not installable (at least, not through pip
- I'm getting Python 2 errors) and and the pattern3 doesn't contain all the code base (at first sight).
By the way, Python 3 is getting more and more focus today and it's very good idea to follow this trend. You use a lot of packages, somehow embedded which is definitely not good idea for the future (e.g. BeautifulSoup_v3.2.1 is not supported for years).
@kotrfa pattern3/pattern isn't on pip yet (so not installable), the tests aren't passing for python 3 either so it's not ready for release yet - though quite a bit of work has been done. I think the plan is for this fork to become the pattern on pip (at least that's my understanding), and it'll support both python 2 and 3.
In pattern3/pattern I've ripped out a load of the vendorised deps (which is perhaps why it looks like the code base is so different), for example beautiful soup. The tests from clips/pattern are still all there and all pass (in python 2), so nothing was removed in this process (I claim).
If you'd like to help out, which would be fantastic, please clone pattern3/pattern and see if you can help with anything in the todo list (maybe pick a test file and get it passing in both python 2 and 3, perhaps the section you need in your work?). I have a few of the areas of the codebase passing already (in both python 2 and 3), IMO it's not a huge amount of work to go :) mostly fiddly unicode stuff, then we can get it out on pip...
Hello,
yeah - I was speaking about installing this fork, not Pattern3, which is, as you said, not available on pip.
I don't really need any part of pattern currently - my work is almost done and I've found Pattern to late, unfortunately. Nevertheless, maybe I could replace some parts of my current code using Pattern and simplify it. In that case, I would definitely like to help. But it doesn't seem likely I'll do it in following weeks, since end of semester is coming.
You have done quite a lot of amazing job by the way, thank you!
FYI all, I did a little the last couple of days, now test_db
and test_web
are the only remaining py3 failing tests files (also test_examples
, but that's IMO a special case). I don't think they should be too bad to fix... e.g. main things
Surprisingly these are py3 only failures (the py2 still passes)...
That said, there are some hacks - especially the unicode workflow - which could be cleaned up.
Edit: Too hasty in victory, I've nearly got vector working https://travis-ci.org/hayd/pattern/jobs/43751620
Thanks for the information! It is really promising. :+1:
@tom-de-smedt actually the vector thing is a little weird, it looks like that vector tests fails about 50% of the time on python 3 although it passes all the time on python 2; from running the test 10 times on both. In a way it's good that I think we're into a place where expertise is needed! :) see https://github.com/pattern3/pattern/pull/17
+1 for Python 3 support.
I realize the need to support a mature, powerful, and loyal community of legacy Python users, but Python 3 is only going to get more relevant with time, not less.
More importantly, Python 3 is just better. Its standard library organization is much cleaner, its syntax is more readable, and in many common cases it performs significantly better than Python 2 (speed and/or memory footprint).
That said, it’s often tricker to port to Python 3 than it “feels” like it should be. For a while, six
has helped make this a little easier, but it only went so far.
To make the transition as painless as possible, I strongly recommend the Python-Future package. It is way more powerful than six
; it has tools focused on automating as much of the transition as possible; and it has truly excellent documentation.
I believe it was mentioned earlier in this thread, but I just wanted to reiterate its awesomeness for anyone that might have missed it. Seriously—just browsing its documentation can evoke the inspiration to transition to a 2-3 compatible codebase.
I haven’t used Pattern yet, but it also has excellent documentation (great job!). Unfortunately, my current research is in Python 3. That’s how I found my way to this page. I hope Pattern gets to Python 3 soon!
Keep up the excellent work, and May The Source™ Be With You!
@Zearin I used future to do the majority of the heavy lifting in the python 3 port, see the pattern3 repo. Please do try it out.
How could you define the "state" of the project for porting Pattern into Python 3?
I used two years ago for Python 2.7 and it was awesome, now I'm going to work with Python 3 and I would love to use it (Pattern) again!
Thanks!
Greetings, we came across this from here, and I just noticed that while a lot of the build looks stable, support for Python 3.3 seems not to be working? At least that is how I would interpret the Travis CI page. Thanks.
I just quickly tested it on Python 3.4, by creating a conda virtual environment with python 3.4 (using conda create -n python3 python=3.4 anaconda
) and running the following:
git clone https://github.com/pattern3/pattern.git
cd pattern
python setup.py install
However, unfortunately, upon testing, text parsing functions at least for the web module do not seem to work... In the test folder I ran python test_web.py
which is what we are using, and the following is a sample of what I got back...
======================================================================
FAIL: test_plaintext (__main__.TestPlaintext)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_web.py", line 455, in test_plaintext
u"<a href=\"http://www.domain.com\">link</a>\n\n* item1 xxx\n* item2")
AssertionError: 'tags amp; things\n\ntitle1\n\ntitle2\n\nparagr[93 chars]tem2' != 'tags & things\n\ntitle1\n\ntitle2\n\nparagraph[76 chars]tem2'
- tags amp; things
======================================================================
FAIL: test_encode_utf8 (__main__.TestUnicode)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_web.py", line 53, in test_encode_utf8
self.assertTrue(isinstance(web.encode_utf8(s), str))
AssertionError: False is not true
----------------------------------------------------------------------
Ran 91 tests in 0.500s
FAILED (failures=4, errors=40, skipped=1) 357d ⍉
(python3)
I used pattern with Python 2 before and I loved it, but now I switched to Python 3. What is the status of porting Pattern to Python 3?
It is astonishing to me that someone hasn't completed a full update to get Python 3 version of pattern working. I guess I will fork pattern3 and try to finish it myself.
never mind. too many recursion errors, encoding errors, etc. someone who knows the actual codebase should really update it.
@jamesacampbell it's not so far off https://github.com/pattern3/pattern/issues/5
@hayd yeah i was getting bogged down in all of the failures in test_web and test_db but did notice that the others were passing. commenting in that thread now. thanks
Tips to write Python 2-3 compatible code: http://python-future.org/compatible_idioms.html https://docs.python.org/3/howto/pyporting.html
This library is very good, and it can't stop in time...
Support for Python 3 is a long and ongoing discussion.
@hayd has done a lot of work on this -- see also https://github.com/pattern3/pattern
@hayd and I agreed that we should merge his work back into the main branch and take it step by step in a single branch, but I never get round to it due to time constraints. It's frustrating. Students at the university are now taught Python 3 and we can't offer Pattern to tell them about Natural Language Processing.
I will give it another go by submitting the task to Google Summer of Code.
But really Pattern needs more people that manage pull requests, that have push and admin rights, and that can take it into their own hands. Being the sole admin has worked well in the past to keep the source code clean and the focus tight, but we need to rethink this strategy.
Contact me at tom@organisms.be if you feel like stepping up.
@tom-de-smedt I would be interested in taking this on as a part of GSoC.
Possibly relevant in this discussion: http://www.python3statement.org/ , major scientific Python projects are phasing out 2.7 support. You might even want to consider dropping 2.7 entirely and switch to 3 at a next major version release, if that makes the transition easier.
Hello everyone, i've gone through the issues and it's very interesting. So many questions asked about pattern
's compatibility with Python3. Currently working on a python3 project and i need pattern but i can't install due to compatibility issues.
@hayd i went through your pattern3 repo and noticed the Travis build was failing do you need help with that.
@andela-bolajide can't speak for @hayd but please if you have time fork it and get the travis builds passing and then do a merge request or I would use your fork as is in meantime if you get it fully working in Python 3. I don't have time myself to do it.
Okay @jamesacampbell, i'll get to work tomorrow and i'll be in touch. Cheers man
Hi guys, I found that textblob is using pattern library and they provide python 3 support. So if anyone is in a hurry, just to there and check the docs.
Hi , I using python 3, I cant install pattern with anaconda. I try 'pip install pattern',but it doesn't work . this is the result:
Pattern isn't compatible with Python3 yet. @afsun
my friend installed it for python3. but she cant remember how did she do that ! I edited the file 'setup.py' and inserted parentheses for 'print'. but I dont know how to install pattern yet.where should I copy this file?
@tom-de-smedt I would be interested in taking this on as a part of GSoC.
I did not take this project.
I will work on this issue as part of this year's GSoC, so there will definitely be some substantial progress over the summer. We will probably track most of the development in the preliminary pattern3 repository for now, since parts of the code are already ported. We'll see what the status quo is – what works and doesn't work – over the next weeks.
So if you want to be part of all that (which would be great!), bring in your ideas or thoughts on the process and make sure to follow the above mentioned repository.
@markus-beuckelmann very excited about this, thanks
Update: As part of Google Summer of Code 2017, Markus Beuckelmann (@markus-beuckelmann) will be working on the future of Pattern (porting it to Python 3 is first on our list). Markus is admin of the repo now and can handle pull requests and invite collaborators. Be sure to reach out to him and include him in discussions about the port. About the pattern3 fork: a lot of work was done here by Andy Hayden (@hayd). Andy & I agreed that a fork, which was my idea, was not the best idea. All work on porting the toolkit should happen here. So we will take what we can use from the pattern3 repo, put it in here, and continue here, eventually discarding the pattern3 fork. It is less confusing for everyone if we work on 1 repo instead of 2 forks. Hopefully we can make some progress over the summer.
Thanks @tom-de-smedt, I hope it's going to be a productive summer! Here is how I plan to proceed...
feedparser
, json
, BeautifulSoup
, ... in pattern.web
) from the code base and make them external dependencies. (Done!)test_web.py
fail because either APIs are discontinued or deprecated – we can just skip these for now and deal with it later. Others are more important, e.g. in test_en.py
, test_text.py
and it's not immediately clear (at least to me) why they fail now when they used to pass. I'll have to look into this...master
branch untouched for now and keep all the commits working towards a stable Python 2.7 version in the development
branch. Forked from development
we will have a python3
branch where we can work on porting to Python 3.python3
back to development
later down the line and eventually merge everything back into master
. If the changes are significant enough, we should at some point consider releasing a new major release.Finally, and I think I speak in the name of many Pattern users/developers, special thanks to @hayd for all the valuable work done in the pattern3 fork. We will make use of it wherever reasonable.
Just checked pattern3 and it seems it also uses sgmllib inside pattern/web, would be good someone is familiar how it works or what it does (maybe code for that lib?) as due to that one I can't even start running tests :) I think it can be changed to lxml. Every other test of pattern3 seems doesn't have such dependencies and can be worked out.
I realize I'm a bit behind on keeping people following this issue up to date with the latest progress! Google Summer of Code is over, since a couple weeks now already, and it has brought substantial progress (see full list of commits). We are now in a position where we have a version on the development
branch that supports all modules except for pattern.server
on both Python 2.7 and Python 3.5+. For people who want to find out more about the specifics and intermediate steps, go ahead and read my detailed GSoC reports on the Newsaudit blog (#1, #2, #3).
So now the plan is to smooth out the rough edges and release a new major version Pattern 3.0 within the next months. There is really only one known bug at the moment that is solely related to Python 3 and it only affects the information gain tree classifier IGTree
in pattern.vector
. Then there are a couple of issues like deprecated web APIs in pattern.web
that should be addressed before the next release.
In the meantime, everybody feel free to check out the development
branch and report any issues that may come along!
Pattern should start supporting Python 3. Looking at the amount of code, it is a non-trivial task and any help is much appreciated.