gavodachs / docker-dachs

Docker image for GAVO DaCHS
https://hub.docker.com/r/gavodachs/dachs/
GNU General Public License v2.0
8 stars 7 forks source link

ADQL form doesn't work in v2.1 #13

Closed Laubeee closed 4 years ago

Laubeee commented 4 years ago

hi Some time ago I setup a docker service for testing, and it worked quite well. The image was :latest at the time and used dachs version 1.2, but there is no existing label for the docker image anymore. However its accessible through the SHA, using FROM chbrandt/dachs@sha256:d11ebc9c07d97f32c1332b5664e5a2909e9ea879470addd2531dd82dbc49a489 AS mydachs.

The same setup works OK with dachs:1.4, but somehow the gavo serve [re]start command instantly terminates, whereas gavo serve debug works as expected (and shows no errors). Weird, but so far so good.

As I would love to use Python3 and also am not interested in setting up a new service that runs on deprecated software from the very start, I tried the :latest tag which runs dachs v2.1. Same issue here with the gavo serve, but on top, the ADQL html form doesn't work. From a bit debugging it seems the query field is not sent properly to the server, and the server somehow (instead of sending a response like "field is required" as it did in prev versions) responds with a status 400 BUT in the web.log it writes it responded with 200, which is unsettling.

Any ideas where this comes from or how it can be fixed?

My Dockerfile:


FROM chbrandt/dachs:latest AS mydachs
# FROM chbrandt/dachs:1.4 AS mydachs
# FROM chbrandt/dachs@sha256:d11ebc9c07d97f32c1332b5664e5a2909e9ea879470addd2531dd82dbc49a489 AS mydachs

EXPOSE 80

# copy local files
COPY ./gavoetc/gavo.rc /etc/gavo.rc
COPY ./gavoetc/defaultmeta.txt /var/gavo/etc/defaultmeta.txt
COPY ./img/fhnw_logo_big.png /var/gavo/web/nv_static/img/logo_big.png
COPY ./img/fhnw_logo_medium.png /var/gavo/web/nv_static/img/logo_medium.png

# launch Postgres and install the service planets as a first example and launch DaCHS
CMD [ "/bin/bash", "-c", "\
    /etc/init.d/postgresql restart && \
    su dachsroot -c '\
        gavo imp ecallisto/q && \
        gavo pub //services && \
        gavo pub //tap && \
        gavo pub ecallisto/q \
    ' && \
    gavo serve debug && \
#    gavo serve restart && \
    bash" ]
msdemlei commented 4 years ago

On Thu, Aug 13, 2020 at 09:21:02AM -0700, Silvan Laube wrote:

version 1.2, but there is no existing label for the docker image anymore. However its accessible through the SHA, using FROM chbrandt/dachs@sha256:d11ebc9c07d97f32c1332b5664e5a2909e9ea879470addd2531dd82dbc49a489 AS mydachs.

I've not looked into the Docker aspects so far -- Carlos?

issue here with the gavo serve, but on top, the ADQL html form doesn't work. From a bit debugging it seems the query field is not sent properly to the server, and the server somehow (instead of sending a response like "field is required" as it did in prev versions) responds with a status 400 BUT in the web.log it writes it responded with 200, which is unsettling.

The 200 log probably looks like this:

2020-08-14 08:32:24+0200 [-] - - - [14/Aug/2020:06:32:23 +0000] "'(no method yet)' '(no uri yet)' '(no clientproto yet)'" 200 - "-" "-"

If true, that's close to the best I can do when twisted fails to even build a request object. But now that you mention it, I give you I shouldn't be putting in a 200, which of course is pointless when I'm not even sending a response (which is impossible given I don't even have a proper request). I'll look into it and at least make it a 500.

As to why the request is illegal in the first place: I'm pretty sure it's because whitespace sneaks into the first line of the request, such that it might look like

GET /adql foo bar HTTP/1.1

or so. To see if I'm right, could you run

sudo ngrep -i lo GET

and post the request line? Or, about as good, just copy the "Result URL" you see on the ADQL form and post it?

Laubeee commented 4 years ago

No... here is a complete log of opening the form and sending a simple query

2020-08-14 09:05:40+0000 [-] Log opened.
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:11 +0000] "GET /__system__/adql/query/form HTTP/1.1" 200 3802 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/css/formal.css HTTP/1.1" 200 765 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/js/jquery-gavo.js HTTP/1.1" 200 64698 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/js/gavo.js HTTP/1.1" 200 3292 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/css/gavo_dc.css HTTP/1.1" 200 4083 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/img/bookmark.png HTTP/1.1" 200 713 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/js/samp.js HTTP/1.1" 200 6311 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:12+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:12 +0000] "GET /static/img/logo_medium.png HTTP/1.1" 200 16551 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:13+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:13 +0000] "GET /static/img/grey-s.png HTTP/1.1" 200 561 "http://127.0.0.1/static/css/gavo_dc.css" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:14+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:13 +0000] "GET /favicon.ico HTTP/1.1" 404 1332 "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"
2020-08-14 09:06:20+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:19 +0000] "POST /__system__/adql/query/form HTTP/1.1" 200 - "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"

the action target in the form is fine, so unless javascript jumps in and does something else entirely this doesn't seem to be it.

not familiar with ngrep but your command gives an error. ngrep -i GET doesnt find much, with POST:

interface: eth0 (172.17.0.0/255.255.0.0)
filter: ((ip || ip6) || (vlan && (ip || ip6)))
match: POST
#
T 172.17.0.1:50364 -> 172.17.0.2:80 [AP] #1
  POST /__system__/adql/query/form HTTP/1.1..Host: 127.0.0.1..Connection: keep-alive..Content-Length: 757..Cache-Control: max-age=0..Upgrade-Insec   
  ure-Requests: 1..Origin: http://127.0.0.1..Content-Type: multipart/form-data; boundary=----WebKitFormBoundarytlcvhdXWivLtNG1l..User-Agent: Mozil   
  la/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36..Accept: text/html,application/xh   
  tml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9..Sec-Fetch-Site: same-origin..Sec-Fetch-Mod   
  e: navigate..Sec-Fetch-User: ?1..Sec-Fetch-Dest: document..Referer: http://127.0.0.1/__system__/adql/query/form..Accept-Encoding: gzip, deflate,   
   br..Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7....------WebKitFormBoundarytlcvhdXWivLtNG1l..Content-Disposition: form-data; name="_ch   
  arset_"....UTF-8..------WebKitFormBoundarytlcvhdXWivLtNG1l..Content-Disposition: form-data; name="__nevow_form__"....genForm..------WebKitFormBo   
  undarytlcvhdXWivLtNG1l..Content-Disposition: form-data; name="query"....select * from rhessi.flares..------WebKitFormBoundarytlcvhdXWivLtNG1l..C   
  ontent-Disposition: form-data; name="_TIMEOUT"....5..------WebKitFormBoundarytlcvhdXWivLtNG1l..Content-Disposition: form-data; name="MAXREC"....   
  100..------WebKitFormBoundarytlcvhdXWivLtNG1l..Content-Disposition: form-data; name="_FORMAT"....HTML..------WebKitFormBoundarytlcvhdXWivLtNG1l.   
  .Content-Disposition: form-data; name="submit"....Go..------WebKitFormBoundarytlcvhdXWivLtNG1l--..

its funny how the query is now correctly sent - i checked again with the network tool from MS-edge (chrome somehow screws it up) and now its there. Maybe the form was reset and I forgot to actually enter it when I analyzed this yesterday :'D But the result is the same.

msdemlei commented 4 years ago

On Fri, Aug 14, 2020 at 02:23:56AM -0700, Silvan Laube wrote:

2020-08-14 09:06:20+0000 [-] 172.17.0.1 - - [14/Aug/2020:09:06:19 +0000] "POST /system/adql/query/form HTTP/1.1" 200 - "http://127.0.0.1/__system__/adql/query/form" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36"

That looks fine to me.

POST /system/adql/query/form HTTP/1.1..Host: 127.0.0.1..Connection: keep-alive..Content-Length: 757..

That looks good to me, too.

But the result is the same.

You mean, the browser still shows a 400 bad request? That would be unsettling, indeed. Is there any payload in that 400 response?

[btw., for the case I had suspected first (where there would be a "no clientproto yet" in the log), DaCHS will a 400 in the next release]

chbrandt commented 4 years ago

Hi Silvan

Some time ago I setup a docker service for testing, and it worked quite well. The image was :latest at the time and used dachs version 1.2, but there is no existing label for the docker image anymore.

As you noticed, latest will always refer to the last released version of DaCHS (if not, poke me through an issue!). (For the time being -- until users fully migrate to DaCHS 2.x -- the latest version of DaCHS-1 will also have a Docker representative, which could be found (as you also noticed) under dachs:1, or dachs:1.4 (for instance).)

The same setup works OK with dachs:1.4, but somehow the gavo serve [re]start command instantly terminates, whereas gavo serve debug works as expected (and shows no errors). Weird, but so far so good.

It is not clear to me what "command instantly terminates" exactly means. Notice that it is the natural behaviour to have the process going background (silently) when not in debug mode. Whereas gavo serve debug (use dachs serve start -f on v2.x) keeps it foreground writing logs to stdout for our convenience.

Check the following, please: after gavo serve start, check what comes out from ps aux. You should see a line with the gavo process like the following:

gavo        67  0.3  3.5 305104 72328 ?        S    07:26   0:00 /usr/bin/python3 /usr/bin/gavo serve start

As I would love to use Python3 and also am not interested in setting up a new service that runs on deprecated software from the very start, I tried the :latest tag which runs dachs v2.1. Same issue here with the gavo serve,

Check the above (output of # ps aux) and let me know if that solves this one.

Laubeee commented 4 years ago

You mean, the browser still shows a 400 bad request? That would be unsettling, indeed. Is there any payload in that 400 response?

Yes, 400. No response body nor any response headers... can you try to reproduce it?

It is not clear to me what "command instantly terminates" exactly means. Notice that it is the natural behaviour to have the process going background (silently) when not in debug mode.

Yeah I'm not referring to bash output, but I can't access the service (e.g. the adql form). I am pretty sure I also checked with ps but will again when in the office (seems there is trouble with VPN at the moment). I can't reproduce it locally though, see below.

In the meantime I did some more testing:

Laubeee commented 4 years ago

Yup. On the Server (running dachs 2.1 as docker-compose with awstats) I get following:

root@ed9c58143c04:/# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19948  3548 pts/0    Ss+  11:22   0:00 bash
postgres    20  0.1  0.3 287668 25916 ?        S    11:22   0:00 /usr/lib/postgresql/9.6/bin/postgres -D /var/lib/postgr
postgres    22  0.0  0.0 287668  4020 ?        Ss   11:22   0:00 postgres: 9.6/main: checkpointer process
postgres    23  0.0  0.0 287668  4020 ?        Ss   11:22   0:00 postgres: 9.6/main: writer process
postgres    24  0.0  0.1 287668  8724 ?        Ss   11:22   0:00 postgres: 9.6/main: wal writer process
postgres    25  0.0  0.0 288072  6752 ?        Ss   11:22   0:00 postgres: 9.6/main: autovacuum launcher process
postgres    26  0.0  0.0 142812  4128 ?        Ss   11:22   0:00 postgres: 9.6/main: stats collector process
root       108  0.6  0.0  19952  3624 pts/1    Ss   11:23   0:00 bash
root       113  0.0  0.0  38384  3140 pts/1    R+   11:23   0:00 ps -aux
root@ed9c58143c04:/# gavo serve start
root@ed9c58143c04:/# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19948  3548 pts/0    Ss+  11:22   0:00 bash
postgres    20  0.0  0.3 287668 25916 ?        S    11:22   0:00 /usr/lib/postgresql/9.6/bin/postgres -D /var/lib/postgr
postgres    22  0.0  0.0 287668  4020 ?        Ss   11:22   0:00 postgres: 9.6/main: checkpointer process
postgres    23  0.0  0.0 287668  4020 ?        Ss   11:22   0:00 postgres: 9.6/main: writer process
postgres    24  0.0  0.1 287668  8724 ?        Ss   11:22   0:00 postgres: 9.6/main: wal writer process
postgres    25  0.0  0.0 288072  6752 ?        Ss   11:22   0:00 postgres: 9.6/main: autovacuum launcher process
postgres    26  0.0  0.0 142812  4128 ?        Ss   11:22   0:00 postgres: 9.6/main: stats collector process
root       108  0.0  0.0  19952  3792 pts/1    Ss   11:23   0:00 bash
root       125  0.0  0.0  38384  3156 pts/1    R+   11:23   0:00 ps -aux
root@ed9c58143c04:/# gavo serve debug
2020-08-17 11:38:03+0000 [-] Log opened.
2020-08-17 11:38:03+0000 [-] Site starting on 80
2020-08-17 11:38:03+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f6cbdf29cc0>
_
msdemlei commented 4 years ago

Ok -- I can reproduce this, and debugging it leads deep into the bowels of twisted, which chokes on something docker does to the data stream (posting from inside of the container works).

I'm sure we could figure this out with some effort, but I'm pretty sure we would just rediscover some long-fixed bug in twisted. Because, you know, I'm surprised the whole thing works at all, what with the container being based on stretch.

Since basing the container on buster is something we'll need to do anyway, let's do it now and see if the problem goes away.

Carlos: Please, for now, use the postgresql-11-pgsphere extension available from our release repo rather than postgresql-pgsphere from Debian (that's missing healpix functionality)

Laubeee commented 4 years ago

twisted is a gavo dependency, right? Can it be updated to see if it solves the problem?

msdemlei commented 4 years ago

You can, and I just did, by dist-upgrading a running container to buster. It does help here. However, I'd wager the actual underlying problem is that the container runs python 3.5 (the POST-payload parsing is horrible in there). And that's definitely too old for DaCHS, which is why I'm amazed anything at all works.

So, just upgrading twisted will not buy you much, I suppose.

I'd hope upgrading the container is a simple as replacing stretch with buster in some dockerfile, so that sounds like the way to go. Let's see what Carlos says.

Laubeee commented 4 years ago

A quick test seems to solve the problem in docker if switching to buster (python 3.7.3) πŸ‘

I also updated twisted for the ubuntu18 container (python 3.6.9) but there it fails somewhere else:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 235, in process
    self.render(resrc)
  File "/usr/local/lib/python3.6/dist-packages/twisted/web/server.py", line 302, in render
    body = resrc.render(self)
  File "/usr/lib/python3/dist-packages/gavo/web/formrender.py", line 522, in render
    customCallback=lambda res: self._formatOutput(res, request))
  File "/usr/lib/python3/dist-packages/gavo/formal/form.py", line 576, in render
    formName = request.args.pop(FORMS_KEY, [b""])[0].decode("utf-8")
AttributeError: 'str' object has no attribute 'decode'

which is odd. Str.decode() is python2 syntax, but it seems work on buster..?

whole thing results in a 500 internal server error, and an actual error message, which I think is at least some progress xD

Anyway, I might want to check if I can get this whole thing running on a debian target server instead..

msdemlei commented 4 years ago

I also updated twisted for the ubuntu18 container (python 3.6.9) but there it fails somewhere else:

    customCallback=lambda res: self._formatOutput(res, request))
  File "/usr/lib/python3/dist-packages/gavo/formal/form.py", line 576, in render
    formName = request.args.pop(FORMS_KEY, [b""])[0].decode("utf-8")
AttributeError: 'str' object has no attribute 'decode'

which is odd. Str.decode() is python2 syntax, but it seems work on buster..?

No, it's not actually syntax -- the question is whether request.args contains strings or bytes. Twisted's choice of returning bytes (understandable for many reasons, but still annoying) was a large nerve-wrecker when porting DaCHS.

Now... if you see that error in ubuntu this would suggest that there, request.args values would suddenly be strings, which I find hard to believe (and would make me slightly unhappy, though I delude myself into thinking I'll cope better this time). Out of curiosity: What's the twisted version there?

Laubeee commented 4 years ago

What's the twisted version there?

20.3.0

btw: I also had to update service_identity package manually.. which was then 18.1.0

chbrandt commented 4 years ago

Cool. I am now uploading an image based on Debian Buster, at chbrandt/dachs:buster. Twisted there is 18.9.0-3.

@Laubeee , could you share your docker-compose, please?

Laubeee commented 4 years ago
version: "3.7"
services:
  dachs:
    build: "./dachs"
    volumes:
      - ./mydisk/rhessi:/var/gavo/inputs/rhessi
      - ./logs:/var/gavo/logs/
    ports:
      - "80:80"
    stdin_open: true
    tty: true

  awstats:
    build: "./awstats"
    volumes:
      - ./logs:/var/gavo/logs/
    ports:
      - "8080:80"
    stdin_open: true
    tty: true
Laubeee commented 4 years ago

FYI: when upgrading twisted on Ubuntu18 to 18.9.0 (instead of 20.3.0) the same error message about Str.decode() appears.. (but no upgrade of service_identity is needed) [default twisted was 17.9.0]

So it may be due to some other dependency... here's what Python packages are around in the two containers:

Ubuntu18:

asn1crypto (0.24.0)
astropy (3.0)
attrs (17.4.0)
Automat (0.6.0)
beautifulsoup4 (4.6.0)
certifi (2018.1.18)
chardet (3.0.4)
click (6.7)
colorama (0.3.7)
configobj (5.0.6)
constantly (15.1.0)
cryptography (2.1.4)
cycler (0.10.0)
docutils (0.14)
gavodachs (2.1)
healpy (1.10.3)
html5lib (0.999999999)
hyperlink (17.3.1)
idna (2.6)
incremental (16.10.1)
keyring (10.6.0)
keyrings.alt (3.0)
linecache2 (1.0.0)
lxml (4.2.1)
matplotlib (2.1.1)
numpy (1.13.3)
olefile (0.45.1)
PAM (0.4.2)
pbr (3.1.1)
Pillow (5.1.0)
pip (9.0.1)
pluggy (0.6.0)
ply (3.11)
psutil (5.4.2)
psycopg2 (2.8.5)
py (1.5.2)
pyasn1 (0.4.2)
pyasn1-modules (0.2.1)
pycrypto (2.6.1)
Pygments (2.2.0)
pygobject (3.26.1)
PyHamcrest (2.0.2)
pymoc (0.5.0)
pyOpenSSL (17.5.0)
pyparsing (2.2.0)
pyserial (3.4)
pytest (3.3.2)
pytest-arraydiff (0.2)
pytest-astropy (0.2.1)
pytest-doctestplus (0.1.2)
pytest-openfiles (0.2.0)
pytest-remotedata (0.2.0)
python-dateutil (2.6.1)
pytz (2018.3)
pyxdg (0.25)
requests (2.18.4)
rjsmin (1.0.12)
roman (2.0.0)
SecretStorage (2.3.1)
service-identity (16.0.0)
setuptools (39.0.1)
six (1.11.0)
testresources (2.0.0)
traceback2 (1.4.0)
Twisted (18.9.0)
unittest2 (1.1.0)
urllib3 (1.22)
webencodings (0.5)
wheel (0.30.0)
zope.interface (5.1.0)

Debian 10:

zope.interface  (4.3.2)
webencodings    (0.5.1)
urllib3 (1.24.1)
unittest2   (1.1.0)
Twisted (18.9.0)
traceback2  (1.4.0)
testresources   (2.0.0)
soupsieve   (1.8)
six (1.12.0)
setuptools  (40.8.0)
service-identity    (16.0.0)
scipy   (1.1.0)
roman   (2.0.0)
rjsmin  (1.0.12)
requests    (2.21.0)
python-dateutil (2.7.3)
pytest  (3.10.1)
pytest-remotedata   (0.3.1)
pytest-openfiles    (0.3.2)
pytest-doctestplus  (0.2.0)
pytest-astropy  (0.5.0)
pytest-arraydiff    (0.3)
pyparsing   (2.2.0)
pyOpenSSL   (19.0.0)
pymoc   (0.5.0)
Pygments    (2.3.1)
pyasn1  (0.4.2)
pyasn1-modules  (0.2.1)
py  (1.7.0)
psycopg2    (2.7.7)
psutil  (5.5.1)
ply (3.11)
pluggy  (0.8.0)
Pillow  (5.4.1)
pbr (4.2.0)
olefile (0.46)
numpy   (1.16.2)
more-itertools  (4.2.0)
matplotlib  (3.0.2)
lxml    (4.3.2)
linecache2  (1.0.0)
kiwisolver  (1.0.1)
incremental (16.10.1)
idna    (2.6)
hyperlink   (17.3.1)
html5lib    (1.0.1)
healpy  (1.12.8)
gavodachs   (2.1)
docutils    (0.14)
decorator   (4.3.0)
cycler  (0.10.0)
cryptography    (2.6.1)
constantly  (15.1.0)
configobj   (5.0.6)
colorama    (0.3.7)
Click   (7.0)
chardet (3.0.4)
certifi (2018.8.24)
beautifulsoup4  (4.7.1)
Automat (0.6.0)
attrs   (18.2.0)
atomicwrites    (1.1.5)
astropy (3.1.2)
asn1crypto  (0.24.0)
msdemlei commented 4 years ago

On Mon, Aug 17, 2020 at 04:02:25PM +0000, Silvan Laube wrote:

20.3.0

Odd. The changelog doesn't mention anything indicating there may be plans for changing the types of request.args. On the contrary, for 19.7.0 they explicitly state:

t.w.iweb.IRequest's "args" attribute is now correctly documented to be bytes.

Positively odd. How could a string end up there, then?

I'll set up a Debian testing chroot and see what I can do one of these days. That is, unless there's a strong reason you can't live without the latest Ubuntu, in which case I'd raise the priority there a bit.

Laubeee commented 4 years ago

phew. You mean I should check Ubuntu20? First I tried upgrading to python 3.7 on ubuntu18. Had to upgrade and re-install a couple of dependencies, only to see yet another error πŸ˜„

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/twisted/web/server.py", line 230, in process
    resrc = self.site.getResourceFor(self)
  File "/usr/local/lib/python3.7/dist-packages/twisted/web/server.py", line 899, in getResourceFor
    return resource.getChildForRequest(self.resource, request)
  File "/usr/local/lib/python3.7/dist-packages/twisted/web/resource.py", line 98, in getChildForRequest
    resource = resource.getChildWithDefault(pathElement, request)
  File "/usr/local/lib/python3.7/dist-packages/twisted/web/resource.py", line 201, in getChildWithDefault
    return self.getChild(path, request)
  File "/usr/lib/python3/dist-packages/gavo/web/root.py", line 321, in getChild
    res, postpath = self._locateResourceBasedChild(request, segments)
  File "/usr/lib/python3/dist-packages/gavo/web/root.py", line 268, in _locateResourceBasedChild
    return rendC(request, service), [
  File "/usr/lib/python3/dist-packages/gavo/web/formrender.py", line 495, in __init__
    grend.ServiceBasedPage.__init__(self, request, service)
  File "/usr/lib/python3/dist-packages/gavo/web/grend.py", line 674, in __init__
    ResourceBasedPage.__init__(self, request, service.rd)
  File "/usr/lib/python3/dist-packages/gavo/web/grend.py", line 572, in __init__
    self.queryMeta = svcs.QueryMeta.fromRequest(request)
  File "/usr/lib/python3/dist-packages/gavo/svcs/common.py", line 191, in fromRequest
    res = cls.fromRequestArgs(request.strargs, **kwargs)
  File "/usr/lib/python3/dist-packages/gavo/web/common.py", line 457, in strargs
    self._strargs = self._makeStrargs()
  File "/usr/lib/python3/dist-packages/gavo/web/common.py", line 440, in _makeStrargs
    for key, values in iterParams():
  File "/usr/lib/python3/dist-packages/gavo/web/common.py", line 437, in iterParams
    yield k, self.fields.getlist(k)
AttributeError: 'list' object has no attribute 'getlist'

HOWEVER if using the explicit version of twisted 18.9.0 instead of latest (20.3.0), it actually works 😁 So I suppose its the combination of python3.7 and twisted 18.9.0 that works. Anything else seems to fail somewhere...

So yeah, I'm seriously wondering if there is no better, more stable library than twisted.. https://python.libhunt.com/pyzmq-alternatives points some interesting alternatives like pyzmq, uvloop, asyncio that might be able to get the job done..

as for using ubuntu20.04 as base I run into problems when installing dachs as psycopg2 depends on python < 3.7 and ubuntu20 comes with 3.8...

msdemlei commented 4 years ago

On Tue, Aug 18, 2020 at 07:34:55AM -0700, Silvan Laube wrote:

phew. You mean I should check Ubuntu20?

Well, if you have to have Ubuntu, let me know and I'll see what I can do. However, my advice is to go for Debian stable -- it's the least work overall, also when looking into the future. And of course it's what we run everywhere. Normally, we also support oldstable, but in this case we don't because of python 3.5 (see below).

First I tried upgrading to python 3.7 on ubuntu18. Had to upgrade and re-install a couple of dependencies, only to see yet another error πŸ˜„

Traceback (most recent call last):
    for key, values in iterParams():
  File "/usr/lib/python3/dist-packages/gavo/web/common.py", line 437, in iterParams
    yield k, self.fields.getlist(k)
AttributeError: 'list' object has no attribute 'getlist'

Hm. This looks like a hack I have to make to keep uploads sane going awry, though I have to admit I can't explain off-hand why fields could be a list here.

So yeah, I'm seriously wondering if there is no better, more stable library than twisted.. https://python.libhunt.com/pyzmq-alternatives points some interesting alternatives like pyzmq, uvloop, asyncio that might be able to get the job done..

Well, twisted is well-written, and it's been really solid over the more than ten years DaCHS has been based on it while running on python2. I think in all that time there's just been a single instance in which I had to work around bugs or incompatible changes (compare this to about a gazillion times for pyfits a.k.a. astropy.io.fits).

The port to python3 has been unpleasant, though; but there's quite a bit of additional breakage/changes in python's core (the fallout of which you're seeing above); to see what I mean, check out the code around line 910 at https://github.com/twisted/twisted/blob/trunk/src/twisted/web/http.py (and I still had to monkeypath that).

We're clearly still reeling from many people postponing the python3 migration to the last minute and then getting caught in a frenzy. I am to blame as well (but then I, as probably most everyone else, was waiting for dependencies to magically get ported).

as for using ubuntu20.04 as base I run into problems as psycopg2 depends on python < 3.7 and ubuntu20 comes with 3.8...

Don't worry. I'm sure things will shake out rather quickly as python3 finally becomes the norm and its rough edges (of which there still were surprisingly many in 3.5) will wear off.

Laubeee commented 4 years ago

as for using ubuntu20.04 as base I run into problems when installing dachs as psycopg2 depends on python < 3.7 and ubuntu20 comes with 3.8...

Let me correct this. If I install python3-psycopg2 before updating the sources list for the dachs (and postgres), it gets installed perfectly fine. postgresql-11-pgsphere still fails at some dependency, but not installing anythin explicitly (which gets me 9.6) works.

Anyhow: if v2.1 drops support for stretch and ubuntu18, I think this can be closed. Alternatively (if its a valid use case) a guide could be made on how to update to python3.7 and twisted 18.9.0.. I could share my Dockerfile for ubuntu18.04 if you're interested.

msdemlei commented 4 years ago

On Thu, Aug 20, 2020 at 03:08:51AM -0700, Silvan Laube wrote:

as for using ubuntu20.04 as base I run into problems when installing dachs as psycopg2 depends on python < 3.7 and ubuntu20 comes with 3.8...

Let me correct this. If I install psycopg2 before updating the sources list for the dachs (and postgres), it gets installed perfectly fine. postgresql-11-pgsphere still fails at some dependency, but not installing anythin explicitly (which gets me 9.6) works.

OMG. Well, let's say: For the next year or so, please use Debian buster when running DaCHS on python3.

After that, the smoke should clear, in particular because DaCHS is on the way into Debian main and might then even make it to Ubuntu. Then, we can even cut down on the disreputable shoving around of binary packages across various distributions (which, however, may still work again once the sea has calmed a bit).

   -- Markus
msdemlei commented 4 years ago

HOWEVER if using the explicit version of twisted 18.9.0 instead of latest (20.3.0), it actually works

This thing has bugged me a bit -- but it turns out that twisted 20 hasn't even made it to Debian unstable (it's still in experimental), so here I won't do anything; who knows what might be broken there, and I wonder what Ubuntu is thinking.

Other than that, I've just run DaCHS in a sid container, and after fixing dependencies (that's only in the beta repo at this point) things by and large work (there's an incompatible change in pyparsing that I'll have to work around, so I don't recommend doing that, but you're not terribly likely to hit that).

Laubeee commented 4 years ago

I wonder what Ubuntu is thinking.

nothing. I updated twisted with pip, that's why it gave me the latest, 20.3. the default on 18.04 is 17.9, and on 20.04 I think its 18.9

msdemlei commented 4 years ago

On Wed, Aug 26, 2020 at 07:55:31AM -0700, Silvan Laube wrote:

I wonder what Ubuntu is thinking.

nothing. I updated twisted with pip, that's why it gave me the latest, 20.3. the default on 18.04 is 17.9, and on 20.04 I think its also 18.9

Ah... Well, for the record: don't use pip on operational systems if you value your sanity.

I've had a look at 20.3, and it turns out it still needs a monkeypatch DaCHS is doing because chunked uploads (which TOPCAT is doing) were broken with python3.7 cgi and... aw, don't get me started.

I've extended the monkeypatch up to twisted 20.8 for now, and I'd expect that should fix these issues for now; I'll release a beta containing this (and a bunch of other fixes) soon.

However, while doing that I've noticed that pyparsing 2.4 has some very profound API changes that kill major portions of DaCHS. So, don't pip that whatever you do.

Be that as it may, I think we understand all the malfunctions reported here; so, as far as I am concerned, we ought to close this bug as soon as the default image is based on buster.

chbrandt commented 4 years ago

I'm closing this issue since it seems the issue was "solved" (at least, causes were apparently narrowed down).