amauryfa / lxml

lxml-cffi is a PyPy-friendly port of lxml, based on cffi
21 stars 10 forks source link

Building on PyPy 2.6 (official docker image) breaks with CompileError #9

Open npiganeau opened 9 years ago

npiganeau commented 9 years ago

Building on PyPy 2.6 (with cffi 1.1) breaks with ValidationError whereas it works perfectly on PyPy 2.5. Tests were made with official PyPy docker images (tags '2-2.6' vs '2-2.5').

PyPy 2.6:

npiganeau@ndp-host:~$ docker run -t -i --name pypy pypy:2-2.6 bash
root@8557c3209fe5:/# pip install -e git+git://github.com/amauryfa/lxml.git@cffi#egg=lxml-cffi
Obtaining lxml-cffi from git+git://github.com/amauryfa/lxml.git@cffi#egg=lxml-cffi
  Cloning git://github.com/amauryfa/lxml.git (to cffi) to /src/lxml-cffi
    Complete output from command python setup.py egg_info:
    src/lxml-cffi/includes/__pycache__/_cffi__gf1d3c271xb1c86544.c: In function ‘_cffi_check_struct__xmlOutputBuffer’:
    src/lxml-cffi/includes/__pycache__/_cffi__gf1d3c271xb1c86544.c:1624:24: warning: initialization from incompatible pointer type
       { xmlBuffer * *tmp = &p->buffer; (void)tmp; }
                            ^
    src/lxml-cffi/includes/__pycache__/_cffi__gf1d3c271xb1c86544.c:1625:24: warning: initialization from incompatible pointer type
       { xmlBuffer * *tmp = &p->conv; (void)tmp; }
                            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c: In function ‘_cffi_e__xmlRelaxNGValidErr’:
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7845:8: error: ‘XML_RELAXNG_OK’ undeclared (first use in this function)
       if ((XML_RELAXNG_OK) > 0 || (long)(XML_RELAXNG_OK) != 0L) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7845:8: note: each undeclared identifier is reported only once for each function it appears in
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7855:8: error: ‘XML_RELAXNG_ERR_MEMORY’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_MEMORY) <= 0 || (unsigned long)(XML_RELAXNG_ERR_MEMORY) != 1UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7865:8: error: ‘XML_RELAXNG_ERR_TYPE’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_TYPE) <= 0 || (unsigned long)(XML_RELAXNG_ERR_TYPE) != 2UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7875:8: error: ‘XML_RELAXNG_ERR_TYPEVAL’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_TYPEVAL) <= 0 || (unsigned long)(XML_RELAXNG_ERR_TYPEVAL) != 3UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7885:8: error: ‘XML_RELAXNG_ERR_DUPID’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_DUPID) <= 0 || (unsigned long)(XML_RELAXNG_ERR_DUPID) != 4UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7895:8: error: ‘XML_RELAXNG_ERR_TYPECMP’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_TYPECMP) <= 0 || (unsigned long)(XML_RELAXNG_ERR_TYPECMP) != 5UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7905:8: error: ‘XML_RELAXNG_ERR_NOSTATE’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_NOSTATE) <= 0 || (unsigned long)(XML_RELAXNG_ERR_NOSTATE) != 6UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7915:8: error: ‘XML_RELAXNG_ERR_NODEFINE’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_NODEFINE) <= 0 || (unsigned long)(XML_RELAXNG_ERR_NODEFINE) != 7UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7925:8: error: ‘XML_RELAXNG_ERR_LISTEXTRA’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_LISTEXTRA) <= 0 || (unsigned long)(XML_RELAXNG_ERR_LISTEXTRA) != 8UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7935:8: error: ‘XML_RELAXNG_ERR_LISTEMPTY’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_LISTEMPTY) <= 0 || (unsigned long)(XML_RELAXNG_ERR_LISTEMPTY) != 9UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7945:8: error: ‘XML_RELAXNG_ERR_INTERNODATA’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_INTERNODATA) <= 0 || (unsigned long)(XML_RELAXNG_ERR_INTERNODATA) != 10UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7955:8: error: ‘XML_RELAXNG_ERR_INTERSEQ’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_INTERSEQ) <= 0 || (unsigned long)(XML_RELAXNG_ERR_INTERSEQ) != 11UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7965:8: error: ‘XML_RELAXNG_ERR_INTEREXTRA’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_INTEREXTRA) <= 0 || (unsigned long)(XML_RELAXNG_ERR_INTEREXTRA) != 12UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7975:8: error: ‘XML_RELAXNG_ERR_ELEMNAME’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMNAME) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMNAME) != 13UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7985:8: error: ‘XML_RELAXNG_ERR_ATTRNAME’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ATTRNAME) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ATTRNAME) != 14UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:7995:8: error: ‘XML_RELAXNG_ERR_ELEMNONS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMNONS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMNONS) != 15UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8005:8: error: ‘XML_RELAXNG_ERR_ATTRNONS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ATTRNONS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ATTRNONS) != 16UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8015:8: error: ‘XML_RELAXNG_ERR_ELEMWRONGNS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMWRONGNS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMWRONGNS) != 17UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8025:8: error: ‘XML_RELAXNG_ERR_ATTRWRONGNS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ATTRWRONGNS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ATTRWRONGNS) != 18UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8035:8: error: ‘XML_RELAXNG_ERR_ELEMEXTRANS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMEXTRANS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMEXTRANS) != 19UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8045:8: error: ‘XML_RELAXNG_ERR_ATTREXTRANS’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ATTREXTRANS) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ATTREXTRANS) != 20UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8055:8: error: ‘XML_RELAXNG_ERR_ELEMNOTEMPTY’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMNOTEMPTY) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMNOTEMPTY) != 21UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8065:8: error: ‘XML_RELAXNG_ERR_NOELEM’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_NOELEM) <= 0 || (unsigned long)(XML_RELAXNG_ERR_NOELEM) != 22UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8075:8: error: ‘XML_RELAXNG_ERR_NOTELEM’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_NOTELEM) <= 0 || (unsigned long)(XML_RELAXNG_ERR_NOTELEM) != 23UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8085:8: error: ‘XML_RELAXNG_ERR_ATTRVALID’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ATTRVALID) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ATTRVALID) != 24UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8095:8: error: ‘XML_RELAXNG_ERR_CONTENTVALID’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_CONTENTVALID) <= 0 || (unsigned long)(XML_RELAXNG_ERR_CONTENTVALID) != 25UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8105:8: error: ‘XML_RELAXNG_ERR_EXTRACONTENT’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_EXTRACONTENT) <= 0 || (unsigned long)(XML_RELAXNG_ERR_EXTRACONTENT) != 26UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8115:8: error: ‘XML_RELAXNG_ERR_INVALIDATTR’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_INVALIDATTR) <= 0 || (unsigned long)(XML_RELAXNG_ERR_INVALIDATTR) != 27UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8125:8: error: ‘XML_RELAXNG_ERR_DATAELEM’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_DATAELEM) <= 0 || (unsigned long)(XML_RELAXNG_ERR_DATAELEM) != 28UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8135:8: error: ‘XML_RELAXNG_ERR_VALELEM’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_VALELEM) <= 0 || (unsigned long)(XML_RELAXNG_ERR_VALELEM) != 29UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8145:8: error: ‘XML_RELAXNG_ERR_LISTELEM’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_LISTELEM) <= 0 || (unsigned long)(XML_RELAXNG_ERR_LISTELEM) != 30UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8155:8: error: ‘XML_RELAXNG_ERR_DATATYPE’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_DATATYPE) <= 0 || (unsigned long)(XML_RELAXNG_ERR_DATATYPE) != 31UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8165:8: error: ‘XML_RELAXNG_ERR_VALUE’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_VALUE) <= 0 || (unsigned long)(XML_RELAXNG_ERR_VALUE) != 32UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8175:8: error: ‘XML_RELAXNG_ERR_LIST’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_LIST) <= 0 || (unsigned long)(XML_RELAXNG_ERR_LIST) != 33UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8185:8: error: ‘XML_RELAXNG_ERR_NOGRAMMAR’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_NOGRAMMAR) <= 0 || (unsigned long)(XML_RELAXNG_ERR_NOGRAMMAR) != 34UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8195:8: error: ‘XML_RELAXNG_ERR_EXTRADATA’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_EXTRADATA) <= 0 || (unsigned long)(XML_RELAXNG_ERR_EXTRADATA) != 35UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8205:8: error: ‘XML_RELAXNG_ERR_LACKDATA’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_LACKDATA) <= 0 || (unsigned long)(XML_RELAXNG_ERR_LACKDATA) != 36UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8215:8: error: ‘XML_RELAXNG_ERR_INTERNAL’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_INTERNAL) <= 0 || (unsigned long)(XML_RELAXNG_ERR_INTERNAL) != 37UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8225:8: error: ‘XML_RELAXNG_ERR_ELEMWRONG’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_ELEMWRONG) <= 0 || (unsigned long)(XML_RELAXNG_ERR_ELEMWRONG) != 38UL) {
            ^
    src/lxml-cffi/__pycache__/_cffi__g65f9f103xa680a747.c:8235:8: error: ‘XML_RELAXNG_ERR_TEXTWRONG’ undeclared (first use in this function)
       if ((XML_RELAXNG_ERR_TEXTWRONG) <= 0 || (unsigned long)(XML_RELAXNG_ERR_TEXTWRONG) != 39UL) {
            ^
    Building lxml version 3.4.0.
    Building without Cython.
    Using build configuration of libxslt 1.1.28
    Traceback (most recent call last):
      File "<builtin>/app_main.py", line 75, in run_toplevel
      File "<builtin>/app_main.py", line 601, in run_it
      File "<string>", line 20, in <module>
      File "/src/lxml-cffi/setup.py", line 232, in <module>
        **setup_extra_options()
      File "/src/lxml-cffi/setup.py", line 145, in setup_extra_options
        STATIC_CFLAGS, STATIC_BINARIES)
      File "setupinfo.py", line 167, in ext_modules
        import lxml.etree
      File "src/lxml/../lxml-cffi/etree.py", line 26, in <module>
        from .xmlerror import _initThreadLogging, clear_error_log
      File "src/lxml/../lxml-cffi/xmlerror.py", line 150, in <module>
        libraries=['xml2'])
      File "/usr/local/lib_pypy/cffi/api.py", line 373, in verify
        lib = self.verifier.load_library()
      File "/usr/local/lib_pypy/cffi/verifier.py", line 96, in load_library
        self._compile_module()
      File "/usr/local/lib_pypy/cffi/verifier.py", line 192, in _compile_module
        outputfilename = ffiplatform.compile(tmpdir, self.get_extension())
      File "/usr/local/lib_pypy/cffi/ffiplatform.py", line 38, in compile
        outputfilename = _build(tmpdir, ext)
      File "/usr/local/lib_pypy/cffi/ffiplatform.py", line 65, in _build
        raise VerificationError('%s: %s' % (e.__class__.__name__, e))
    VerificationError: CompileError: command 'cc' failed with exit status 1

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /src/lxml-cffi

PyPy 2.5

npiganeau@ndp-host:~$ docker run -t -i --name pypy pypy:2-2.5 bash
root@6b84fa93cfd9:/# pip install -e git+git://github.com/amauryfa/lxml.git@cffi#egg=lxml-cffi
You are using pip version 7.0.1, however version 7.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Obtaining lxml-cffi from git+git://github.com/amauryfa/lxml.git@cffi#egg=lxml-cffi
  Cloning git://github.com/amauryfa/lxml.git (to cffi) to /src/lxml-cffi
Installing collected packages: lxml-cffi
  Running setup.py develop for lxml-cffi
Successfully installed lxml-cffi
aglyzov commented 9 years ago

I believe these bindings are outdated and cannot be used with today's PyPy. After much fiddling with the sources I was able to build it on OSX 10.9.5 (Maverics) / macports + PyPy 2.6. However it was super unstable, spat out RuntimeError when querying an elem for its children (in random places). Totally unusable IMO. So I deleted it all together.

amauryfa commented 9 years ago

The error messages seem to indicate a different version of libxml, PyPy's version is not relevant here. Can you check which version of libxml you are using?

aglyzov commented 9 years ago

2.9.2

npiganeau commented 9 years ago

@amauryfa Actually it is the same version of libxml in both docker instances: 2.9.1.

root@6689ce5271c5:/# pypy --version
Python 2.7.9 (9c4588d731b7, Mar 23 2015, 16:30:30)
[PyPy 2.5.1 with GCC 4.6.3]
root@6689ce5271c5:/# ldconfig -v|grep xml
        libxml2.so.2 -> libxml2.so.2.9.1
root@3b5593dee3af:/# pypy --version
Python 2.7.9 (295ee98b6928, May 31 2015, 07:29:04)
[PyPy 2.6.0 with GCC 4.8.2]
root@3b5593dee3af:/# ldconfig -v |grep xml
        libxml2.so.2 -> libxml2.so.2.9.1
aglyzov commented 9 years ago

Then perhaps it's an OSX thing

aglyzov commented 9 years ago

And while we're at it, in my tests I noticed it took the cffi+pypy lxml ~3-5 times more time than cpython lxml to parse the same HTML page. I wasn't expecting that at all, thought cffi would be faster. Have you seen anything like it?

amauryfa commented 9 years ago

2015-06-21 22:41 GMT+02:00 Aleksandr Glyzov notifications@github.com:

And while we're at it, in my tests I noticed it took the cffi+pypy lxml ~3-5 times more time than cpython lxml to parse the same HTML page. I wasn't expecting that at all, thought cffi would be faster. Have you seen anything like it?

I remember doing some benchmarks, and the results were in the same range as CPython (some were 40% faster, others 40% slower...) But it's very possible that some paths are still much slower.

Also, be sure to warm up pypy's JIT by running 1000 loops of your function before doing any timings.

Amaury Forgeot d'Arc

nichochar commented 9 years ago

+1 on this error. What is the suggested solution here?

aldarund commented 9 years ago

+1. same error here on debian without any docker. Any solutions?

aglyzov commented 9 years ago

@nichochar and @aldarund guys, try this fork: https://github.com/aglyzov/lxml.git I have added explicit libxml/relaxng.h includes everywhere and it started to build ok.