LanguageMachines / PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation
Other
48 stars 6 forks source link

Autosearch forwarder gives server error #51

Closed peterdekker closed 5 years ago

peterdekker commented 5 years ago

Clicking the Autosearch forwarder (for one file) on a file, gives a 500 internal server error, caused by Python: https://pastebin.ubuntu.com/p/4HRCH4HsRM/

proycon commented 5 years ago

Seems like a bug in the forward viewer in CLAM (moving the issue there). I'll look into it!

proycon commented 5 years ago

I released the new CLAM version that should hopefully fix this (but I haven't been able to properly test it as I don't have an autosearch to forward to).

peterdekker commented 5 years ago

I re-installed, but again, an internal servor error occurs when clicking autosearch link. This time, a different error from python: https://pastebin.com/WzAermAd

proycon commented 5 years ago

Okay, a silly error in the error handling I see. I fixed that and did a new CLAM release. So now you should hit the real error message: It seems communication is established with the remote service (AutoSearch), but the remote service didn't respond with the expected HTTP 302 redirect response code.

(Instead of updating all of LaMachine, you may want to just do a pip install -U clam to only update clam and speed things up for testing, do restart the piccl webservice afterwards)

peterdekker commented 5 years ago

Thanks! That would be a nice shortcut, however I think the last version of clam is not yet in PyPI. The last version I see is 2.4.7 from august 25th: https://pypi.org/project/CLAM/#history

proycon commented 5 years ago

Right :) I see I mistyped my password so the upload failed. fixed now!

peterdekker commented 5 years ago

I discussed with the AutoSearch developers: we expected a different behaviour than what is currently implemented.

The user should be directly redirected to the autosearch_forward_url in the browser. There should be no prior server-side request to this url, expecting a 302. This is not possible, since the service is behind a CLARIN login, so it can't return a 302. It has to be directly opened in the user's browser, so the CLARIN session from PICCL is used.

If I understand your code correctly, your can directly do the following in clam/viewers.py (https://github.com/proycon/clam/blob/d08cc6b4d729001e517f8936b22b4db3f787acd9/clam/common/viewers.py#L71): flask.redirect(self.forwarder.forwardlink) without any prior requests.get

proycon commented 5 years ago

You're right indeed, the extra server-side handling is an obstacle here. I removed it (or rather, made it optional) and the default is now that CLAM itself issues a 302 to the forward url when clicking the link. I'm releasing v2.4.9 (will take a few minutes) to fix this (better not do an entire LaMachine update and just pip install -U clam because we still have some unfortunate breakages we are fixing at the moment)

peterdekker commented 5 years ago

Nice! The link to AutoSearch works now. However, I saw that the backlink is just the filename, not the full url of where the fill is located. Could you add the full url, where the xml is located, as backlink?

proycon commented 5 years ago

Oops, should be fixed now in v2.4.10 !

peterdekker commented 5 years ago

I updated this, but unfortunately, it introduces a new error. The whole PICCL webinterface is not reachable and gives a server error. This is the error in the log https://pastebin.ubuntu.com/p/5X3C5cw7PS/

proycon commented 5 years ago

A bug in the version checker it seems (it think 2.4.10 < 2.4.5, string instead of integer comparison) , I wonder how that went unnoticed for quite a while. It's fixed now in yet another version (v2.4.11). Let's hope we don't go on like this and hit an integer version overflow next ;)

peterdekker commented 5 years ago

Thanks, now the interface works again. But the file URL is still not right: it points to the file location on the server file system, not the publicly available URL where the deployment is running.

The backlink now points to: /vol1/lamachine/var/www-data/piccl.clam/projects/p.dekker@umail.leidenuniv.nl/p2/output/testttt.frogged.folia.xml

For our deployment, the backlink should point to: https://portal.clarin.inl.nl/piccl/p2/output/testttt.frogged.folia.xml PICCL should either infer the server URL (that would be ideal), or there should be a variable somewhere to define it.

proycon commented 5 years ago

I see yes, it was a mistake in my earlier fix, took a wrong shortcut. Sorry it's such a hassle with you being the guinea pig for testing this feature. I did yet another release now (2.4.12).

peterdekker commented 5 years ago

Now, there is a simple Python error again, looks the same as a few versions ago. I don't mind testing on our deployment (it has become quite quick now), but can you please test if the code works on your system before?

[Mon Sep 02 15:54:03.612964 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996] Traceback (most recent call last):
[Mon Sep 02 15:54:03.612973 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
[Mon Sep 02 15:54:03.612979 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     response = self.full_dispatch_request()
[Mon Sep 02 15:54:03.612985 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
[Mon Sep 02 15:54:03.612990 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     rv = self.handle_user_exception(e)
[Mon Sep 02 15:54:03.613010 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
[Mon Sep 02 15:54:03.613018 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     reraise(exc_type, exc_value, tb)
[Mon Sep 02 15:54:03.613024 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
[Mon Sep 02 15:54:03.613031 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     raise value
[Mon Sep 02 15:54:03.613093 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
[Mon Sep 02 15:54:03.613100 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     rv = self.dispatch_request()
[Mon Sep 02 15:54:03.613105 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
[Mon Sep 02 15:54:03.613113 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     return self.view_functions[rule.endpoint](**req.view_args)
[Mon Sep 02 15:54:03.613118 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/clam/common/auth.py", line 287, in decorated
[Mon Sep 02 15:54:03.613125 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     return f(*args, **kwargs)
[Mon Sep 02 15:54:03.613130 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/clam/clamservice.py", line 1239, in getoutputfile
[Mon Sep 02 15:54:03.613137 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     output = viewer.view(outputfile, **flask.request.values)
[Mon Sep 02 15:54:03.613152 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/clam/common/viewers.py", line 68, in view
[Mon Sep 02 15:54:03.613206 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     self.forwarder(None, None, outputfile=file) #this sets the forwardlink on the instance
[Mon Sep 02 15:54:03.613260 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]   File "/vol1/lamachine/lib/python3.6/site-packages/clam/common/data.py", line 2350, in __call__
[Mon Sep 02 15:54:03.613300 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996]     self.forwardlink =  self.url.replace("$BACKLINK", baseurl + '/' + project + '/output/' + outputfile.filename)
[Mon Sep 02 15:54:03.613346 2019] [wsgi:error] [pid 3400] [remote 172.16.10.43:50996] TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
proycon commented 5 years ago

I'm gonna set up a test (not autosearch though), because this indeed is not working well this way..

proycon commented 5 years ago

Okay, I set up a test environment and hopefully really fixed it now (2.4.13) :)

peterdekker commented 5 years ago

Thanks! It seems that it works now :)