CollaboraOnline / online

Collabora Online is a collaborative online office suite based on LibreOffice technology. This is also the source for the Collabora Office apps for iOS and Android.
https://collaboraonline.com
Other
1.78k stars 681 forks source link

Failed to create DocBroker error: cmd=load kind=docunloading #9173

Closed AssiaAzzouzi closed 2 months ago

AssiaAzzouzi commented 3 months ago

Describe the Bug

I have Nextcloud 28.0.1.1 installed on a vm and configured to work with collabora. I have collabora 24.04.3.1 installed on a separate vm. Both of Nextcloud and collabora are behind an nginx reverse proxy. After creating a new file on Nextcloud, adding content to it, saving it and reopening it immediately after I lose its content.

Steps to Reproduce

1- I create file.odt on Nextcloud. 2- I write something on it. 3- I save it. 4- I close it. 5- I open it again immediately after closing it.

Expected Behavior

The file should open with the content I wrote before closing.

Actual Behavior

there are many random resulted behaviours:

1- The screen displays an error "Socket connection establishment failed or socket connection closed unexpectedly" as shown in the screenshot. After waiting a little bit the file is rendered either empty or with the content I wrote.

or

2- The screen doesn't display the error and the file is rendered either empty or with the content I wrote.

Screenshots

Screenshot 2024-05-28 at 19 40 53

Versions

Additional Context

In Collabora here are the logs I see while saving the file:

wsd-10789-11216 2024-05-28 18:39:11.771791 +0200 [ docbroker_00e ] WRN Upload is already in progress.| wsd/wopi/WopiStorage.cpp:594 wsd-10789-11216 2024-05-28 18:39:11.771981 +0200 [ docbroker_00e ] WRN Failed to upload [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxxxx] asynchronously. Activity::None| wsd/DocumentBroker.cpp:1941

Here are the logs I see when the error "Socket connection establishment failed or socket connection closed unexpectedly" appears in the screen:

wsd-10789-10861 2024-05-28 18:39:38.898677 +0200 [ websrv_poll ] ERR #39: Failed to create DocBroker [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxxxx]: error: cmd=load kind=docunloading| wsd/RequestVettingStation.cpp:333 wsd-10789-10861 2024-05-28 18:39:39.679691 +0200 [ websrv_poll ] WRN DocBroker [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxxxx] is unloading. Rejecting client request to load session [045]| wsd/ClientRequestDispatcher.cpp:139

It appears that because of the unloading state of the file, this one can't be rendered to the client and the request keeps trying to get it until it's opened and rendered.

Apart from these logs I have these errors when I open the Nextcloud office interface. It seems that they are related to the absence of some icons:

wsd-10789-10861 2024-05-28 18:39:20.944231 +0200 [ websrv_poll ] ERR FileServerRequestHandler: File not found: Invalid URI request: [/browser/594b605/images/lc_zoom.svg].| wsd/FileServer.cpp:747 wsd-10789-10861 2024-05-28 18:39:21.341236 +0200 [ websrv_poll ] ERR FileServerRequestHandler: File not found: Invalid URI request: [/browser/594b605/images/lc_invertbackground.svg].| wsd/FileServer.cpp:747

Ashod commented 3 months ago

Hi @AssiaAzzouzi,

Thanks for reporting this. Are you able to set the log-level to trace and reproduce? If yes, please compress and attach the log file with the reproducer.

Thank you!

AssiaAzzouzi commented 3 months ago

Hi @Ashod

Yeah of course:

Here are the traces statring from creating the file, inserting content to it, saving it and opening it immediately after. Here is the behaviour where the screen displays the error "Socket connection establishment failed or socket connection closed unexpectedly" as shown in the previous screenshot. After waiting a little bit the file is rendered with the content I wrote.

From the side of collabora: coolwsd-trace.zip

From the side of Nextcloud using admin_audit plugin: audit-nextcloud.log

Thanks!

Ashod commented 3 months ago

Thank you, @AssiaAzzouzi.

Ashod commented 3 months ago

Hi @AssiaAzzouzi,

In the attached log file, the document was saved an uploaded twice to the NextCloud server.

The first time, it took 18362ms to upload 7743 bytes. That's a rate of 7743/18.362=421.7 bytes/second! Not even 0.5 KB/s.

The second time, it took 16959ms to upload 15802 bytes. Slightly better rate of 15802/16.959=931.8 bytes/second! That's still less than 1 KB/s.

During those long upload times (totaling over 25 seconds) you attempted to re-load the document but were rejected because the document was already unloaded from memory and was waiting to finish uploading to the storage before reloading it again.

The slow NextCloud server explains the issue you experience. You either need more powerful VM/hardware, or more patience, I'm afraid.

I'm very curious about the VM specs you have. Specifically, how much RAM have you assigned the VMs and how many CPU cores. Also, what's the real hardware CPU specs (model, number of cores, stock frequency)?

Thank you.

AssiaAzzouzi commented 3 months ago

Hello @Ashod

Thanks for you feedback.

Actually at the moment of file's upload I notice, at the monitoring interface, that there is a peak in CPU usage in both of Collabora and Nextcloud but that doesn't exceed VM's resources:

At the moment of upload there is only a usage of 3% of CPU in Collabora and 30,76% of CPU Nextcloud. RAM consumption is almost unaffected.

Here are VMs specs:

Nextcloud Instance:

Collabora Instance:

CPU Specs of Both VMs:

Ashod commented 3 months ago

Thanks you, @AssiaAzzouzi. Unfortunately, this is a severely under-powered setup. You need at least 4 cores between the two VMs. It is recommended to have at least 4 cores for the Collabora instance.

The issue here is one of hardware resources. There are two solutions: upgrade the hardware or wait longer for the save and upload to complete.

Hope this helps. Thankyou.

AssiaAzzouzi commented 3 months ago

I just noticed that the antivirus app in the side of Nextcloud was slowing down the upload of file from Collabora to Nextcloud. This one analyses each file before uploading it to Nextcloud.

When I deactivate the antivirus app the rate of file's upload becomes 25854 bytes in 158ms which is approximatively ≈159.78 KB/s and faster than before.

Also with the Antivirus module activated, while doing a real time monitoring of CPU of Nextcloud I notice that, indeed, Nextcloud was reaching the limit of its CPU at the moment of file's upload. I presume that the monitoring interface was just not reflecting this reach before.

After some research I resorted to configure my antivirus to run as a daemon instead of an executable. And that made my antivirus faster and less resource consumer. With that my upload of files from Collabora to Nextcloud is faster ≈98.82 KB/s (with antivirus activated)

This made the issue happen less often. But it still happens nevertheless if I close and reopen very fast. But now the screen error quickly disappears. And file content is not lost.

In terms of logs, those ones don't show up anymore:

wsd-10789-11216 2024-05-28 18:39:11.771791 +0200 [ docbroker_00e ] WRN Upload is already in progress.| wsd/wopi/WopiStorage.cpp:594
wsd-10789-11216 2024-05-28 18:39:11.771981 +0200 [ docbroker_00e ] WRN Failed to upload [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxxxx] asynchronously. Activity::None| wsd/DocumentBroker.cpp:1941

but thoses ones are still present when the screen error appears:

wsd-45546-45619 2024-06-03 17:31:00.704326 +0200 [ websrv_poll ] WRN  DocBroker [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxx] is unloading. Rejecting client request to load session [034]| wsd/ClientRequestDispatcher.cpp:139
wsd-45546-45619 2024-06-03 17:31:00.704362 +0200 [ websrv_poll ] ERR  #33: Failed to create DocBroker [https%3A%2F%2Fnextcloud_url%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2Fxxx]: error: cmd=load kind=docunloading| wsd/RequestVettingStation.cpp:333

This is a good enough fix for my case. Although gracefully handling it would be better.

For now we only have a few users, I will consider increasing resources when the number of users increases.

Thank you @Ashod.

Ashod commented 1 month ago

Thanks, @AssiaAzzouzi for the details.

The error you see with docunloading is telling the UI that unloading the document has started and we cannot recover it. The only solution is to wait until unloading finishes and try loading again, which is what the UI does. You reference to "the screen error quickly disappears" is referring to the system recovering (as gracefully as possible). First, we report that loading cannot happen because of the docunloading status, but then when we retry shortly after, we succeed.

With resources being constrained, eventually the system will not be as responsive as the user expects, and they will get to see errors that they don't expect. It is important to have at least the minimum required resources, and to scale if and when necessary. Please provision 4 cores and 4GB of RAM for the host running coolwsd, and add at least 150MB per additional concurrent user. At some point you'd need more cores, but coolwsd handles about 15 concurrent users per core, so you will have some room for growth with the minimum 4 cores and 4GB, before needing to add more.