clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.75k stars 1.58k forks source link

Python 3 support #62

Open tom-de-smedt opened 10 years ago

tom-de-smedt commented 10 years ago

Pattern should start supporting Python 3. Looking at the amount of code, it is a non-trivial task and any help is much appreciated.

transfluxus commented 7 years ago

not sure how long this is gonna take. but as soon as the page is up again maybe you can add some info about that dev works with py3, what the restrictions are (what does not work yet) and how to install it. e.g. go into your python3 environment!

git clone https://github.com/clips/pattern
cd pattern
git fetch
git checkout development
python setup.py install

might make a lot of people happy

markus-beuckelmann commented 7 years ago

Sure, I will update the README.md on the master branch with some more information in the next days.

jpfairbanks commented 6 years ago

@markus-beuckelmann Is there any update on this? Is the current advice to build the development branch if we need python 3 support?

markus-beuckelmann commented 6 years ago

@jpfairbanks, yes, if you need Python 3 support right now you can check out the development branch: git clone -b development https://github.com/clips/pattern

derNarr commented 6 years ago

On debian 9 mysql_init was missing, while trying to install the development branch of pattern under python3. After I installed the mariadb drop-in replacement for mysql_init with sudo apt-get install libmariadbclient-dev everything I tried worked as expected with the pattern module.

masaguaro commented 6 years ago

I have installed pattern3 with pip in a conda virtual environment. I am working in Windows 8.1 64-bit .When I try to execute from pattern3.en import tag I have the following errror:

Traceback (most recent call last):
  File "test_pattern.py", line 5, in <module>
    from pattern3.en import tag
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\en\__init__.py", line 22, in <module>
    from pattern3.text import (
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\__init__.py", line 28, in <module>
    from pattern3.text.tree import Tree, Text, Sentence, Slice, Chunk, PNPChunk, Chink, Word, table
  File "C:\Users\Rodolfo\Anaconda3\envs\flasky\lib\site-packages\pattern3\text\tree.py", line 37
    except:
         ^
IndentationError: expected an indented block

Any idea or help will be appreciated :)

JanmajaySingh commented 6 years ago

@masaguaro I have the same issue while trying to use gensim lemmatization. Maybe a recent push gone wrong by a mix of tabs and spaces?

I'll open a new issue about this.

Update: the last commit to pattern3.text.tree.py seems to be 3 years ago. Issue #217

masaguaro commented 6 years ago

Thank you @JanmajaySingh . I posted the same question on stackoverflow but I had no answer . It seems that there is not too much Python 3 support. Perhaps you are right, and it's just a mix of tabs and spaces, which shouldn't be difficult to fix (using Sublime, for example). Right now, I am doing some work with NLPTK, but I will keep your idea for future use.

markus-beuckelmann commented 6 years ago

@masaguaro @JanmajaySingh (#217), you are using the deprecated pattern3 repository which contains a completely different code base that is not maintained anymore. There is a development branch here on clips/pattern with Python 3 support. You can clone it, git clone -b development https://github.com/clips/pattern and install with pip or conda. Let us know if there are any issues with the development branch...

JanmajaySingh commented 6 years ago

@markus-beuckelmann Thanks! Issue #217 was closed.

masaguaro commented 6 years ago

Hello @markus-beuckelmann @JanmajaySingh
I am working in Windows 8.1 64-bit, in a conda virtual environment. I was trying to run pattern/examples/01-web/04-twitter.py and I had the following error:

Traceback (most recent call last):
  File "C:\Users\Desktop\pattern\examples\01-web\04-twitter.py", line 12, in <module>
    from pattern.db import Datasheet, pprint, pd
  File "C:\Users\Desktop\pattern\examples\01-web\..\..\pattern\db\__init__.py", line 1879, in <module>
    csvlib.field_size_limit(sys.maxsize)
OverflowError: Python int too large to convert to C long
[Finished in 1.5s]

Any idea ? Thank you in advance.

ash-williams commented 6 years ago

Hey,

Is there any time frame on an official release of the python3 support? Or an idea of how close it is to being ready?

Thanks, Ash

tuxayo commented 6 years ago

If enough people are interested, especially if it helps them working. Maybe we can consider putting a bounty on this task? It would be nice if such work would be paid :) https://www.bountysource.com/issues/1685084-python-3-support Bountysource

JanmajaySingh commented 6 years ago

@zedrem @tuxayo Considering that the development branch was last updated 9 months ago, I guess the primary contributors have been busy.

The dev branch in its current form works without issues (at least for me). You can refer to @markus-beuckelmann 's comment (March 6).

JanmajaySingh commented 6 years ago

@masaguaro I dunno if you could find a workaround to your issue, but I guess modifying the C source code to something like long long int might help. But it may break other modules in unexpected ways. I don't know any other details about your project though, you're better off asking on S/O.

ash-williams commented 6 years ago

Yea thanks, I've read the thread and understand that you can use the tool from the development branch. For what I needed, I was happy to even use the pattern3 side-project that was set up initially and then discontinued.

However, I really like patterns article extraction tool and want to incorporate it into another tool that I'm building. As far as i'm aware (?) there is no way to do that with pattern in its current condition. I'm guessing that you can't specify specific git branches in your requirements.txt for example?

If anyone is aware of any similar article extraction tools, please let me know (but I'm conscious that it is off topic for this thread).

septian-putra commented 6 years ago

@zedrem For me, I can install it by running this

sudo apt-get install libmysqlclient-dev
git clone -b development https://github.com/clips/pattern
cd pattern/
sudo python3 setup.py install
fabianhoward commented 6 years ago

@zedrem You can certainly specify commits in requirements.txt such as git+https://github.com/clips/pattern@ec95f97b2e34c2232e7c43ef1e34e3f0dea6654b

As @septiangilang says on ubuntu you will need libmysqlclient-devas a requirement.

tom-de-smedt commented 6 years ago

A lot of work was done by @markus-beuckelmann during last year's GSoC. During this year's GSoC, @Xsardas1000 (Maksim Filim) is doing great work (Markus & me are mentoring). Check Max' progress here: https://github.com/clips/pattern/tree/devmodified

We should be able to get out an "overall stable" official release by the end of the month, if everything goes well.

If you notice things that don't work yet, please report them here. Better yet, if you want to help out, please let us know, we can give you some editing privileges and author credits to move things forward more quickly.

As a side note, the documentation needs to move to a new location too (e.g., www.pattern3.net). Let us know if you'd want to contribute some web development skills to this end.

Thanks for your patience, we're nearing a stable release of Pattern 3.

tales-aparecida commented 5 years ago

Hy, I'm an undergrad student at Unicamp (Brazil) and got interested in helping this repo. I thought about starting with code coverage and found that there's some duplicated code at server/__init__.py, db/__init__.py and others, is this intentional?