CollaboraOnline / online

Collabora Online is a collaborative online office suite based on LibreOffice technology. This is also the source for the Collabora Office apps for iOS and Android.
https://collaboraonline.com
Other
1.85k stars 702 forks source link

Collabora 6.4.10.x does not work with Nextcloud when NC is behind HAProxy 2.x #3199

Closed jacotec closed 3 years ago

jacotec commented 3 years ago

Describe the bug Since the update of Collabora CODE from 6.4.9.x to 6.4.10.x, Collabora does not work with a Nextcloud instance in case the Nextcloud instance is sitting behind an HAPROXY >=2.x.

If the Nextcloud instance is sitting behind HAProxy 1.8.25, all works fine.

Also CODE 6.4.9.x works fine with the latest HAPROXY versions.

There must have been a change to CODE 6.4.10 which breaks the WOPI connection to a Nextcloud behind a HAPROXY 2.x. The issue happens when Collabora wants to download data from the Nextcloud instance (WOPI).

To Reproduce Steps to reproduce the behavior:

  1. Have Nextcloud set up using Collabora running in a separate VM. NC's URL is managed by HAPROXY 2.x, SSL is handled also by HAPROXY.
  2. Open any Office document
  3. Collabora hangs in "Initializing" state
  4. Collabora fails with "Failed to load the document"

Expected behavior Collabora should load the document as it does in 6.4.9.x

Actual behavior Collabora stopped working in this configuration since 6.4.10.x Still not working in 6.4.10.10

Screenshots image

Logs

Sep  7 11:46:23 collabora loolwsd[1099176]: wsd-1099176-1114805 2021-09-07 11:46:23.163474 [ docbroker_004 ] WRN  Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:680
Sep  7 11:46:53 collabora loolwsd[1099176]: wsd-1099176-1114805 2021-09-07 11:46:53.191789 [ docbroker_004 ] WRN  Socket #32 has timed out while requesting [GET cloud.***.de/index.php/apps/richdocuments/wopi/files/160674_oceycd4plel7/contents?access_token=ntKHAcfmr4Wclq87vOrtrcs9URt28KEi&access_token_ttl=0] after 30029ms| net/HttpRequest.hpp:1302
Sep  7 11:46:53 collabora loolwsd[1114642]: kit-1114642-1099278 2021-09-07 11:46:53.288726 [ kitbroker_004 ] ERR  Failed to load: file:///tmp/user/docs/sjl4lOb7YS9lbYYC/LS%20Etelm%20DCTR%20Boards.docx, error: loadComponentFromURL returned an empty reference| kit/Kit.cpp:1316
Sep  7 11:46:53 collabora loolwsd[1114642]: kit-1114642-1099278 2021-09-07 11:46:53.288849 [ kitbroker_004 ] ERR  error: cmd=load kind=faileddocloading| ./common/Session.hpp:136
Sep  7 11:46:53 collabora loolwsd[1114642]: kit-1114642-1099278 2021-09-07 11:46:53.288914 [ kitbroker_004 ] ERR  Failed to get LoKitDocument instance for [file:///tmp/user/docs/sjl4lOb7YS9lbYYC/LS%20Etelm%20DCTR%20Boards.docx].| kit/ChildSession.cpp:684

Additional context

noci2012 commented 3 years ago

CODE 6.4.9.x was working happily with haproxy V2.4.0 in May 2021. V2.4.1 in July 2021. I only have one haproxy in front of both CODE & NC (in multiple versions). Also NC works with OnlyOffice in this setup.

Ashod commented 3 years ago

Thank you for reporting this @jacotec and @noci2012. Much appreciated!

It would help us greatly to see what HAProxy logs on its side for the failed requests. It seems that the requests we send (to HAProxy) time out, but it's not clear why that is. If possible, please share the relevant section from the HAProxy logs.

It would also help to see the matching logs from loolwsd, at trace level. It would be particularly good to see the start-up logs from loolwsd with the ssl-related logs, where it initializes the certificates, etc.

I suspect this is an SSL issue. Perhaps HAProxy doesn't like the (client) SSL certificates that Collabora Online is giving it?

Clearly there is a regression here that we are working on fixing. Please accept our apologies for the inconvenience.

Thank you.

snetAT commented 3 years ago

I have a similar configuration. I have already tried several constellations, all with the same result. My destination configuration https://HAProxy -> http://Nextcloud VM, https://HAProxy -> http://loolwsd or https://NGINX -> http://loolwsd (Docker or Package Installation with the same result ) I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot. Screenshot from 2021-09-07 15-58-21

how did i get so far.

frontend http

acl loolwsd hdr(host) -i loolwsd.example.at
acl loolwsd hdr(upgrade) -i websocket
acl loolwsd hdr(connection) -i upgrade
acl loolwsd hdr_beg(host) -i ws
acl loolwsd hdr_beg(host) -i wss

use_backend be-xxx if loolwsd
Lebernd commented 3 years ago

Hi @Ashod,

I have only 'connect' logs from haproxy (on opnsense) even if I raise to debug level.

But my docker logs show SSL errors:

wsd-00007-00065 2021-09-08 06:53:39.624878 [ docbroker_004 ] WRN  Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682                                             
wsd-00007-00039 2021-09-08 06:53:39.919855 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:53:41.923575 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:53:43.927089 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:53:45.931272 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:53:49.952803 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:53:53.967110 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00065 2021-09-08 06:54:09.651852 [ docbroker_004 ] WRN  Socket #28 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30030ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:54:09.656441 [ docbroker_004 ] WRN  Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682                                             
wsd-00007-00039 2021-09-08 06:54:10.051669 [ websrv_poll ] ERR  Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:13.455237 [ websrv_poll ] WRN  convert-to: Requesting address is denied: ::ffff:10.10.15.1| wsd/LOOLWSD.cpp:2460                                                                
wsd-00007-00039 2021-09-08 06:54:14.058823 [ websrv_poll ] ERR  Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:20.080537 [ websrv_poll ] ERR  Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:22.084781 [ websrv_poll ] ERR  Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:24.087996 [ websrv_poll ] ERR  Socket #34 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:24.414073 [ websrv_poll ] ERR  SocketPoll [docbroker_004] thread is already started.| net/Socket.cpp:250                                                                        
wsd-00007-00039 2021-09-08 06:54:32.111178 [ websrv_poll ] ERR  Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:34.115917 [ websrv_poll ] ERR  Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:38.122646 [ websrv_poll ] ERR  Socket #35 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00065 2021-09-08 06:54:39.680009 [ docbroker_004 ] WRN  Socket #33 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw/contents?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30026ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:54:39.684277 [ docbroker_004 ] WRN  Waking up dead poll thread [HttpSynReqPoll], started: false, finished: false| ./net/Socket.hpp:682                                             
wsd-00007-00039 2021-09-08 06:54:42.128898 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:46.137156 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:48.150133 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:54.183602 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:54:58.211164 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:00.217983 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:08.247392 [ websrv_poll ] ERR  Socket #33 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00065 2021-09-08 06:55:09.711224 [ docbroker_004 ] WRN  Socket #28 has timed out while requesting [GET nextcloud.leuterer.net/index.php/apps/richdocuments/wopi/files/20_ocynvx3jwbdw?access_token=Yb5tIBDdAWkfnuOsp5UqC5TEGBu4YjuY&access_token_ttl=0] after 30029ms| net/HttpRequest.hpp:1304
wsd-00007-00065 2021-09-08 06:55:09.712238 [ docbroker_004 ] ERR  Socket #34 SSL error: SYSCALL (5). (EPIPE: Broken pipe)| ./net/SslSocket.hpp:270                                                               
wsd-00007-00065 2021-09-08 06:55:09.712267 [ docbroker_004 ] ERR  #34: Socket write returned -1 (EPIPE: Broken pipe)| ./net/Socket.hpp:1284                                                                      
wsd-00007-00065 2021-09-08 06:55:09.712426 [ docbroker_004 ] ERR  Socket #21 SSL error: SYSCALL (5). (EPIPE: Broken pipe)| ./net/SslSocket.hpp:270                                                               
wsd-00007-00065 2021-09-08 06:55:09.712446 [ docbroker_004 ] ERR  #21: Socket write returned -1 (EPIPE: Broken pipe)| ./net/Socket.hpp:1284                                                                      
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.699407 [ forkit ] ERR  Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc/tmp].| common/JailUtil.cpp:68                                                               
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.713159 [ forkit ] ERR  Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc/lo].| common/JailUtil.cpp:68                                                                
sh: 1: /usr/bin/loolmount: Operation not permitted
frk-00035-00035 2021-09-08 06:55:10.726387 [ forkit ] ERR  Failed to unmount [/opt/lool/child-roots/7yCaSACQyfeHJ4Dc].| common/JailUtil.cpp:68                                                                   
wsd-00007-00039 2021-09-08 06:55:12.262178 [ websrv_poll ] ERR  Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:14.268351 [ websrv_poll ] ERR  Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:16.273018 [ websrv_poll ] ERR  Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:24.295291 [ websrv_poll ] ERR  Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270                                               
wsd-00007-00039 2021-09-08 06:55:26.298962 [ websrv_poll ] ERR  Socket #21 SSL error: SYSCALL (5). (ECONNRESET: Connection reset by peer)| ./net/SslSocket.hpp:270 

ok, I see it is mostly the same as https://github.com/CollaboraOnline/online/issues/3065#issue-967065874

snetAT commented 3 years ago

I have a new insight In the nextcloud docker container there is a sample spreadsheet document smaller than 8000 bytes that i can open. I can enter and open data as long as it does not exceed 8000 bytes.

jacotec commented 3 years ago

@snetat I can't confirm. Created a small docx with 4kB and it still does not initialize.

jacotec commented 3 years ago

@Ashod I took a Wireshark dump of the outgoing connection to the HAProxy/Nextcloud on the Collabora server. There is some data exchange, the last thing is HAProxy sending application data to Collabora, Collabora acknowledges this data and then 30 seconds nothing happens.

After these 30 seconds, Collabora sends again application data towards the Nextcloud but immediately closes the TCP connection at the same time.

image

10.0.1.34 is the Collabora VM's IP 10.0.0.2 is the IP of the HAProxy.

Ashod commented 3 years ago

@jacotec Thanks, this is helpful. So the handshake seems to be fine, which is good. The 30 seconds is the timeout. It seems we aren't getting all the data we expect (based on the HTTP headers etc.). Would be great to have the trace-level logs from loolwsd that matches one or more of these requests. At trace level we capture a lot of the HTTP activity and it would help us see if there is anything of value there.

Another interesting thing is to see what "Application Data" we get back from HAProxy/NC. Specifically, the HTTP response header would be of paramount importance here (if it is indeed provided). Seeing the complete data exchange between loolwsd and HAProxy/NC would be great, if it's not too much trouble for you. (Let me know if you'd like to send the captured packets privately.)

Thank you!

Ashod commented 3 years ago

I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot.

Hi @snetat, this looks like a different issue to this particular ticket. Can you please confirm that you have Collabora Online enabled and working? Are you able to upload .odt files and open them in Collabora Online? (Is this a new setup or was previously working?)

Thanks.

snetAT commented 3 years ago

I can create new documents, but as soon as I enter data, save them and open them again, the content looks like in the screenshot.

Hi @snetat, this looks like a different issue to this particular ticket. Can you please confirm that you have Collabora Online enabled and working? Are you able to upload .odt files and open them in Collabora Online? (Is this a new setup or was previously working?)

Thanks.

it is an existing installation, with 6.4.9 it works with 6.4.10 not. yes i can upload and open files. with the result see screenshot if the file is larger than 8000 bytes

https://user-images.githubusercontent.com/5556794/132507785-26a8a4d8-9d78-4329-9826-cee647475f73.mp4

the document is not broken, i can download it and open it in libreoffice without problems

jacotec commented 3 years ago

Another interesting thing is to see what "Application Data" we get back from HAProxy/NC. Specifically, the HTTP response header would be of paramount importance here (if it is indeed provided). Seeing the complete data exchange between loolwsd and HAProxy/NC would be great, if it's not too much trouble for you. (Let me know if you'd like to send the captured packets privately.)

@Ashod On the HAProxy side I'm a bit limited like Lebernd as my HAProxy is also running on my pfSense gateway. But I can get a trace level debug on loolwsd for you. How can I send it privately (I don't like to put clear trace files with real URL's etc into the public internet)

jacotec commented 3 years ago

@Ashod I've made a trace level log from Collabora along with a new syncronized Wireshark capture on the Collabora server. Please feel free to send me an email to mj -at- jacotec -dot - de and I'll send you a link to download the traces.

Ashod commented 3 years ago

We have identified an issue and are working on a fix. Thanks @jacotec for the logs, they proved very helpful.

@snetat you may be right that the size has something to do with it. I expect the root cause is the same, so the fix should resolve your case as well.

Thanks!

brtptrs commented 3 years ago

We are also affected by this issue, and it would be great if this could be released soon.

w202mg commented 3 years ago

same here. this is my status output:

WRN  Failed to verify the certificate of [cloud.XXX.net]| ./net/SslSocket.hpp:196
ERR  Socket #32 SSL BIO error: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown (0: Success)| ./net/SslSocket.hpp:330
ERR  Error while handling poll for socket #32 in HttpSynReqPoll: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown| net/Socket.cpp:423
ERR  WOPI::GetFile [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1249_ocfhv9kwyxj6/contents?access_token=&access_token_ttl=0] failed with Status >
ERR  Cannot download document from WOPI storage uri [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1249_ocfhv9kwyxj6/contents?access_token=&access>
ERR  Failed to load: file://, error: Unsupported URL <file://>: "type detection failed"| kit/Kit.cpp:1316
ERR  error: cmd=load kind=faileddocloading| ./common/Session.hpp:136
WRN  Document load failed: faileddocloading| wsd/ClientSession.cpp:1480
ERR  Failed to get LoKitDocument instance for [file://].| kit/ChildSession.cpp:684
WRN  Ignoring attempted read from 24| ./net/Socket.hpp:1021
jacotec commented 3 years ago

It's already fixed in the latest Snapshot and working. @Ashod did a great job identifying the issue. It should land in the next release.

w202mg commented 3 years ago

@jacotec OK, great! By the way I'm using nginx, so seems like it's not depending to the web server

w202mg commented 3 years ago

@Ashod thank you for youe work! Is there an idea when the update willbe rolled out?

Ashod commented 3 years ago

CODE 6.4.11 has been released with, which fixes this issue. Please report any issues whether related or otherwise. Thanks for everyone's patience while we tackled this one.

w202mg commented 3 years ago

Edit I think I will open a separate thread

CODE 6.4.11 has been released with, which fixes this issue. Please report any issues whether related or otherwise. Thanks for everyone's patience while we tackled this one.

Jut updated so my confiig looks now like:

Nextcloud 22.1.1
Nginx 1.18.0-0
Linux Mint 20.1
Collabora Online 4.2.3
Collabor Code 6.4.11-3
No docker in use

loolwsd status:

WRN  Failed to verify the certificate of [cloud.XXX.net]| ./net/SslSocket.hpp:196
ERR  Socket #27 SSL BIO error: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown (0: Success)| ./net/SslSocket.hpp:330
ERR  Error while handling poll for socket #27 in HttpSynReqPoll: error:1420C0CF:SSL routines:ssl_write_internal:protocol is shutdown| net/Socket.cpp:423
ERR  WOPI::CheckFileInfo failed for URI [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6?access_token=NXcBvZ1wTIdAdcTSmg9xLHXAwk9S>
ERR  loading document exception: WOPI::CheckFileInfo failed: | wsd/DocumentBroker.cpp:1926
ERR  Failed to add session to [/index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6] with URI [https://cloud.XXX.net/index.php/apps/richdocuments/wopi/files>
ERR  Storage error while starting session on /index.php/apps/richdocuments/wopi/files/1351_ocfhv9kwyxj6 for socket #24. Terminating connection. Error: WOPI::CheckFileI>
WRN  Ignoring attempted read from 24| ./net/Socket.hpp:1021
ERR  Invalid or unknown session [008] to remove.| wsd/DocumentBroker.cpp:1969
WRN  Prisoner connection disconnected but without valid socket.| wsd/LOOLWSD.cpp:2248

Nextcloud log: Warning Host 127.0.0.1 was not connected to because it violates local access rules