CollaboraOnline / online

Collabora Online is a collaborative online office suite based on LibreOffice technology. This is also the source for the Collabora Office apps for iOS and Android.
https://collaboraonline.com
Other
1.72k stars 667 forks source link

Nextcloud richdocument cannot get preview of large PDF or Office docs from CODE server due to Expect: 100-Continue #6983

Open erikfdev opened 1 year ago

erikfdev commented 1 year ago

I am running Nextcloud 27.0.1 and CollaboraOnline CODE 23.05.2.2 (installed from your deb-packages repository) on the same server. My server OS is Linux Mint 20.1.

Nextcloud uses app 'richdocuments' to produce preview of PDF and Office docs. App Richdocuments uses Guzzle to send (via apache reverse proxy) the request: POST /cool/convert-to/png HTTP/1.1 / Host: the.name.of.my.host.net / Expect: 100-Continue / Content-Type: multipart/form-data; boundary=aae3d303568bfddc5e50234056e4475d5bc1ab77 / User-Agent: Nextcloud Server Crawler / Accept-Encoding: gzip / X-Forwarded-For: 192.168.0.1 / X-Forwarded-Host: office.invara-erde.freeddns.org / X-Forwarded-Server: the.name.of.my.host.net / Content-Length: 2572543 / Connection: Keep-Alive

Note the 'Expect: 100-Continue'. This is present when the document to be converted is large. Apparently conforming the HTTP standard, Guzzle waits for a '100 Continue' reply from the CODE server before sending the contents of the document to be converted. However, the CODE server does not always send a '100 Continue' reply. This causes the apache process running Guzzle to hang, and no preview is sent to the browser. Furthermore, this issue also causes large nextcloud PDF files not to be displayed properly in the browser.

In my humble opinion, the problem might be located in method StreamSocket::parseHeader() around line 1160. My understanding of the code is that only the first time a 'Expect: 100-Continue' is encountered, CODE will send '100 Continue'. However, I wonder why not every 'Expect: 100-Continue' gets its '100 Continue' reply. My understanding of the HTTP standard is that this should be sent each time, unless the request is already complete (which isn't the case here because the document contents is still to follow).

erikfdev commented 11 months ago

Anybody looking into this?

joshtrichards commented 8 months ago

Just adding context:

https://github.com/CollaboraOnline/online/blob/c31030302c2cf7ef17a859e68815d86d454c82ae/net/Socket.cpp#L1165-L1174

Support was originally added for 100-Continue in:

https://github.com/CollaboraOnline/online/commit/4f804a48fe743ac37ee45b8a4c323cad072cdb5e

https://gerrit.libreoffice.org/c/online/+/72746/1

erikfdev commented 8 months ago

Apparently the problem does not appear anymore with Collabora Office 23.05.5.4 combined with Nextcloud 27.1.3. I don't know how that got fixed.

tcitworld commented 8 months ago

Probably with https://github.com/nextcloud/richdocuments/pull/3298