Closed jacotec closed 3 years ago
CODE 6.4.9.x was working happily with haproxy V2.4.0 in May 2021. V2.4.1 in July 2021. I only have one haproxy in front of both CODE & NC (in multiple versions). Also NC works with OnlyOffice in this setup.
Thank you for reporting this @jacotec and @noci2012. Much appreciated!
It would help us greatly to see what HAProxy logs on its side for the failed requests. It seems that the requests we send (to HAProxy) time out, but it's not clear why that is. If possible, please share the relevant section from the HAProxy logs.
It would also help to see the matching logs from loolwsd, at trace level. It would be particularly good to see the start-up logs from loolwsd with the ssl-related logs, where it initializes the certificates, etc.
I suspect this is an SSL issue. Perhaps HAProxy doesn't like the (client) SSL certificates that Collabora Online is giving it?
Clearly there is a regression here that we are working on fixing. Please accept our apologies for the inconvenience.
Thank you.
I have a similar configuration. I have already tried several constellations, all with the same result. My destination configuration https://HAProxy -> http://Nextcloud VM, https://HAProxy -> http://loolwsd or https://NGINX -> http://loolwsd (Docker or Package Installation with the same result ) I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot.
how did i get so far.
frontend http
acl loolwsd hdr(host) -i loolwsd.example.at
acl loolwsd hdr(upgrade) -i websocket
acl loolwsd hdr(connection) -i upgrade
acl loolwsd hdr_beg(host) -i ws
acl loolwsd hdr_beg(host) -i wss
use_backend be-xxx if loolwsd
Hi @Ashod,
I have only 'connect' logs from haproxy (on opnsense) even if I raise to debug level.
But my docker logs show SSL errors:
wsd-00007-00065 2021-09-08 06:53:39.624878 [ docbroker_004 ] WRN Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682
wsd-00007-00039 2021-09-08 06:53:39.919855 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:53:41.923575 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:53:43.927089 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:53:45.931272 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:53:49.952803 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:53:53.967110 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00065 2021-09-08 06:54:09.651852 [ docbroker_004 ] WRN Socket #28 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30030ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:54:09.656441 [ docbroker_004 ] WRN Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682
wsd-00007-00039 2021-09-08 06:54:10.051669 [ websrv_poll ] ERR Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:13.455237 [ websrv_poll ] WRN convert-to: Requesting address is denied: ::ffff:10.10.15.1| wsd/LOOLWSD.cpp:2460
wsd-00007-00039 2021-09-08 06:54:14.058823 [ websrv_poll ] ERR Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:20.080537 [ websrv_poll ] ERR Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:22.084781 [ websrv_poll ] ERR Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:24.087996 [ websrv_poll ] ERR Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:24.414073 [ websrv_poll ] ERR SocketPoll [docbroker_004] thread is already started.| net/Socket.cpp:250
wsd-00007-00039 2021-09-08 06:54:32.111178 [ websrv_poll ] ERR Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:34.115917 [ websrv_poll ] ERR Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:38.122646 [ websrv_poll ] ERR Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00065 2021-09-08 06:54:39.680009 [ docbroker_004 ] WRN Socket #33 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw/contents?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30026ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:54:39.684277 [ docbroker_004 ] WRN Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682
wsd-00007-00039 2021-09-08 06:54:42.128898 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:46.137156 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:48.150133 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:54.183602 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:54:58.211164 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:00.217983 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:08.247392 [ websrv_poll ] ERR Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00065 2021-09-08 06:55:09.711224 [ docbroker_004 ] WRN Socket #28 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30029ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:55:09.712238 [ docbroker_004 ] ERR Socket #34 SSL error: SYSCALL (5). (EPIPE: Broken pipe)| ./net/SslSocket.hpp:270
wsd-00007-00065 2021-09-08 06:55:09.712267 [ docbroker_004 ] ERR #34: Socket write returned -1 (EPIPE: Broken pipe)| ./net/Socket.hpp:1284
wsd-00007-00065 2021-09-08 06:55:09.712426 [ docbroker_004 ] ERR Socket #21 SSL error: SYSCALL (5). (EPIPE: Broken pipe)| ./net/SslSocket.hpp:270
wsd-00007-00065 2021-09-08 06:55:09.712446 [ docbroker_004 ] ERR #21: Socket write returned -1 (EPIPE: Broken pipe)| ./net/Socket.hpp:1284
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.699407 [ forkit ] ERR Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc/tmp].| common/JailUtil.cpp:68
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.713159 [ forkit ] ERR Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc/lo].| common/JailUtil.cpp:68
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.726387 [ forkit ] ERR Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc].| common/JailUtil.cpp:68
wsd-00007-00039 2021-09-08 06:55:12.262178 [ websrv_poll ] ERR Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:14.268351 [ websrv_poll ] ERR Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:16.273018 [ websrv_poll ] ERR Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:24.295291 [ websrv_poll ] ERR Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
wsd-00007-00039 2021-09-08 06:55:26.298962 [ websrv_poll ] ERR Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270
ok, I see it is mostly the same as https://github.com/CollaboraOnline/online/issues/3065#issue-967065874
I have a new insight In the nextcloud docker container there is a sample spreadsheet document smaller than 8000 bytes that i can open. I can enter and open data as long as it does not exceed 8000 bytes.
@snetat I can't confirm. Created a small docx with 4kB and it still does not initialize.
@Ashod I took a Wireshark dump of the outgoing connection to the HAProxy/Nextcloud on the Collabora server. There is some data exchange, the last thing is HAProxy sending application data to Collabora, Collabora acknowledges this data and then 30 seconds nothing happens.
After these 30 seconds, Collabora sends again application data towards the Nextcloud but immediately closes the TCP connection at the same time.
10.0.1.34 is the Collabora VM's IP 10.0.0.2 is the IP of the HAProxy.
@jacotec Thanks, this is helpful. So the handshake seems to be fine, which is good. The 30 seconds is the timeout. It seems we aren't getting all the data we expect (based on the HTTP headers etc.). Would be great to have the trace-level logs from loolwsd that matches one or more of these requests. At trace level we capture a lot of the HTTP activity and it would help us see if there is anything of value there.
Another interesting thing is to see what "Application Data" we get back from HAProxy/NC. Specifically, the HTTP response header would be of paramount importance here (if it is indeed provided). Seeing the complete data exchange between loolwsd and HAProxy/NC would be great, if it's not too much trouble for you. (Let me know if you'd like to send the captured packets privately.)
Thank you!
I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot.
Hi @snetat, this looks like a different issue to this particular ticket. Can you please confirm that you have Collabora Online enabled and working? Are you able to upload .odt files and open them in Collabora Online? (Is this a new setup or was previously working?)
Thanks.
I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot.
Hi @snetat, this looks like a different issue to this particular ticket. Can you please confirm that you have Collabora Online enabled and working? Are you able to upload .odt files and open them in Collabora Online? (Is this a new setup or was previously working?)
Thanks.
it is an existing installation, with 6.4.9 it works with 6.4.10 not. yes i can upload and open files. with the result see screenshot if the file is larger than 8000 bytes
https://user-images.githubusercontent.com/5556794/132507785-26a8a4d8-9d78-4329-9826-cee647475f73.mp4
the document is not broken, i can download it and open it in libreoffice without problems
Another interesting thing is to see what "Application Data" we get back from HAProxy/NC. Specifically, the HTTP response header would be of paramount importance here (if it is indeed provided). Seeing the complete data exchange between loolwsd and HAProxy/NC would be great, if it's not too much trouble for you. (Let me know if you'd like to send the captured packets privately.)
@Ashod On the HAProxy side I'm a bit limited like Lebernd as my HAProxy is also running on my pfSense gateway. But I can get a trace level debug on loolwsd for you. How can I send it privately (I don't like to put clear trace files with real URL's etc into the public internet)
@Ashod I've made a trace level log from Collabora along with a new syncronized Wireshark capture on the Collabora server. Please feel free to send me an email to mj -at- jacotec -dot - de and I'll send you a link to download the traces.
We have identified an issue and are working on a fix. Thanks @jacotec for the logs, they proved very helpful.
@snetat you may be right that the size has something to do with it. I expect the root cause is the same, so the fix should resolve your case as well.
Thanks!
We are also affected by this issue, and it would be great if this could be released soon.
same here. this is my status output:
WRN Failed to verify the certificate of [cloud.XXX.net]| ./net/SslSocket.hpp:196
ERR Socket #32 SSL BIO error: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown (0: Success)| ./net/SslSocket.hpp:330
ERR Error while handling poll for socket #32 in HttpSynReqPoll: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown| net/Socket.cpp:423
ERR WOPI::GetFile [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1249_ocfhv9kwyxj6/contents?access_token=&access_token_ttl=0] failed with Status >
ERR Cannot download document from WOPI storage uri [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1249_ocfhv9kwyxj6/contents?access_token=&access>
ERR Failed to load: file://, error: Unsupported URL <file://>: "type detection failed"| kit/Kit.cpp:1316
ERR error: cmd=load kind=faileddocloading| ./common/Session.hpp:136
WRN Document load failed: faileddocloading| wsd/ClientSession.cpp:1480
ERR Failed to get LoKitDocument instance for [file://].| kit/ChildSession.cpp:684
WRN Ignoring attempted read from 24| ./net/Socket.hpp:1021
It's already fixed in the latest Snapshot and working. @Ashod did a great job identifying the issue. It should land in the next release.
@jacotec OK, great! By the way I'm using nginx, so seems like it's not depending to the web server
@Ashod thank you for youe work! Is there an idea when the update willbe rolled out?
CODE 6.4.11 has been released with, which fixes this issue. Please report any issues whether related or otherwise. Thanks for everyone's patience while we tackled this one.
Edit I think I will open a separate thread
CODE 6.4.11 has been released with, which fixes this issue. Please report any issues whether related or otherwise. Thanks for everyone's patience while we tackled this one.
Jut updated so my confiig looks now like:
Nextcloud 22.1.1
Nginx 1.18.0-0
Linux Mint 20.1
Collabora Online 4.2.3
Collabor Code 6.4.11-3
No docker in use
loolwsd status:
WRN Failed to verify the certificate of [cloud.XXX.net]| ./net/SslSocket.hpp:196
ERR Socket #27 SSL BIO error: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown (0: Success)| ./net/SslSocket.hpp:330
ERR Error while handling poll for socket #27 in HttpSynReqPoll: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown| net/Socket.cpp:423
ERR WOPI::CheckFileInfo failed for URI [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6?access_token=NXcBvZ1wTIdAdcTSmg9xLHXAwk9S>
ERR loading document exception: WOPI::CheckFileInfo failed: | wsd/DocumentBroker.cpp:1926
ERR Failed to add session to [/index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6] with URI [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files>
ERR Storage error while starting session on /index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6 for socket #24. Terminating connection. Error: WOPI::CheckFileI>
WRN Ignoring attempted read from 24| ./net/Socket.hpp:1021
ERR Invalid or unknown session [008] to remove.| wsd/DocumentBroker.cpp:1969
WRN Prisoner connection disconnected but without valid socket.| wsd/LOOLWSD.cpp:2248
Nextcloud log:
Warning Host 127.0.0.1 was not connected to because it violates local access rules
Describe the bug Since the update of Collabora CODE from 6.4.9.x to 6.4.10.x, Collabora does not work with a Nextcloud instance in case the Nextcloud instance is sitting behind an HAPROXY >=2.x.
If the Nextcloud instance is sitting behind HAProxy 1.8.25, all works fine.
Also CODE 6.4.9.x works fine with the latest HAPROXY versions.
There must have been a change to CODE 6.4.10 which breaks the WOPI connection to a Nextcloud behind a HAPROXY 2.x. The issue happens when Collabora wants to download data from the Nextcloud instance (WOPI).
To Reproduce Steps to reproduce the behavior:
Expected behavior Collabora should load the document as it does in 6.4.9.x
Actual behavior Collabora stopped working in this configuration since 6.4.10.x Still not working in 6.4.10.10
Screenshots
Logs
Additional context
It does not care if CODE is behind HAProxy 2.x! Collabora is behind HAPROXY 2.2.14. Two NC instances also behind HAPROXY 2.2.14 are showing the issue, one NC instance behind HAPROXY 1.8.25 worked flawlessly. Updated the HAPROXY 1.8.25 to 2.0.14 at the working NC instance and the same issue occurs here immediately!
The issue was introduced in CODE 6.4.10.x. With 6.4.9.x all is working fine also with everything behind HAPROXY 2.2.14.