Closed hfiguiere closed 1 month ago
~I cannot reproduce with the Node sdk example running locally.~
Strike that last, I was testing with the wrong commit (one that was marked good)
(with the original setup) I timed it at 2:00 minutes between the start of the load and the iframe content being displayed.
Seems I can't reproduce with the node sdk example.
Worked here as well with nodejs .. using https://github.com/CollaboraOnline/collabora-online-sdk-examples/tree/master/webapp/nodejs
.. adding more tests.
The python / django example seems to reproduce the issue .. .. not working .. unknown reason, see below
https://github.com/CollaboraOnline/collabora-online-sdk-examples/tree/master/webapp/python
The build-in framed doc works though .. http://localhost:9980/browser/dist/framed.doc.html?file_path=/usr/local/projects/collabora/LibreOffice/online/test/samples/writer-edit.fodt
The python / django example seems to reproduce the issue .. https://github.com/CollaboraOnline/collabora-online-sdk-examples/tree/master/webapp/python
ah .. the python example gives me a WOPISrc validation error
, the used /wopi/files/1
n/a ..
wsd-1653059-1653085 2024-09-13 09:52:28.545688 +0200 [ websrv_poll ] INF #19: parseHeader: Client HTTP Request: POST, uri `/browser/163b4b80/cool.html?WOPISrc=http://0.0.0.0:8000/wopi/files/1` HTTP/1.1, sz[header 679, content 185], offset 679, chunked false, Host: localhost:9980 / User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:130.0) Gecko/20100101 Firefox/130.0 / Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8 / Accept-Language: en-US,en;q=0.5 / Accept-Encoding: gzip, deflate, br, zstd / Content-Type: multipart/form-data; boundary=---------------------------313513991435228868652782832917 / Content-Length: 185 / Origin: null / DNT: 1 / Connection: keep-alive / Upgrade-Insecure-Requests: 1 / Sec-Fetch-Dest: iframe / Sec-Fetch-Mode: navigate / Sec-Fetch-Site: cross-site / Priority: u=4| net/Socket.cpp:1632
wsd-1653059-1653085 2024-09-13 09:52:28.545696 +0200 [ websrv_poll ] DBG #19: Handling request: /browser/163b4b80/cool.html?WOPISrc=http://0.0.0.0:8000/wopi/files/1| wsd/ClientRequestDispatcher.cpp:670
wsd-1653059-1653085 2024-09-13 09:52:28.545710 +0200 [ websrv_poll ] WRN WOPISrc validation error: unencoded WOPISrc [http://0.0.0.0:8000/wopi/files/1] in URL: /browser/163b4b80/cool.html?WOPISrc=http://0.0.0.0:80
So am I right that in master the nodejs example works?, the python didn't work anyway out of the box before the bisected commit, so what are the steps to reproduce?
Yes, I can confirm that the SDK nodejs + nodejs-esm works, python not (but also didn't before mentioned patches). The build in iframe test is also working. Current master 9f6782dafff06e6f9229bf6455ca03f9ce92384d And I used same FF 130.0 on GNU/Linux.
I can't reproduce the issue either, please advise @hfiguiere
The Python sample is possibly likely broken, I have never touched/tested it.
The drupal connector is https://github.com/CollaboraOnline/collabora-drupal/
From your log file .. ERR
:
issue-10042$ grep ERR coolwsd.log
wsd-3223553-3223553 2024-09-13 09:49:43.969691 -0400 [ coolwsd ] ERR Log level is set very high to 'trace' this will have a significant performance impact. Do not use this in production.| wsd/COOLWSD.cpp:4459
wsd-3223553-3223624 2024-09-13 09:49:43.975092 -0400 [ docbroker_001 ] ERR Failed to stat filesystem [/home/hub/Documents/Collabora/cool-dev/online/jails/3223553-04d88ac2/tmp/cool-uu24BbZsD3k7eC3V/user/docs/0pMi3kARScd5YPxP/hello-world.odp] (ENOENT: No such file or directory)| common/FileUtil.cpp:544
kit-3223619-3223619 2024-09-13 09:49:43.994278 -0400 [ kitbroker_001 ] ERR non-async dialog triggered| kit/Kit.cpp:2713
kit-3223619-3223619 2024-09-13 09:49:44.035768 -0400 [ kitbroker_001 ] ERR non-async dialog triggered| kit/Kit.cpp:2713
wsd-3223553-3223622 2024-09-13 09:51:51.982533 -0400 [ websrv_poll ] TRC Field [checkfileinfo_override] for var [%CHECK_FILE_INFO_OVERRIDE%] = []| wsd/FileServer.cpp:1085
wsd-3223553-3223622 2024-09-13 09:51:52.467312 -0400 [ websrv_poll ] ERR FileServerRequestHandler: File not found: Invalid URI request (hash): [/browser/44cbb47116/branding.css].| wsd/FileServer.cpp:751
wsd-3223553-3223622 2024-09-13 09:51:52.467350 -0400 [ websrv_poll ] ERR #37: Attempted to remove: 396 which is > size: 0 clamped to 0| net/Socket.hpp:1264
wsd-3223553-3223592 2024-09-13 09:51:54.674191 -0400 [ prisoner_poll ] ERR Forkit waitpid failed (ECHILD: No child processes)| wsd/COOLWSD.cpp:3396
.. so right after the timeout, the ERR FileServerRequestHandler: File not found: Invalid URI request (hash) occurred .. FileServer.cpp:751
.. and even before the timeout, a filestat error of an actual (local) file URI (not WOPI)
wsd-3223553-3223622 2024-09-13 09:51:52.467350 -0400 [ websrv_poll ] ERR #37: Attempted to remove: 396 which is > size: 0 clamped to 0| net/Socket.hpp:1264
Looks superficially problematic too I guess (?) should be able to see how we got our arithmetic tangled up there ?
I guess that eraseFirstInputBytes is the call of:
//remove the request body from our input buffer socket->eraseFirstInputBytes(map._messageSize - map._headerSize);
Where map._messageSize is map._headerSize + contentLength where contentLength comes from request.getContentLength which does stoll on the "Content-Length" line, so one possibility is a incorrect/lying Content-Length?
Or if we've already consumed some of the content that we're trying to skip
Right, emphasized by @caolanm as well.
Probably coming from ClientRequestDispatcher::handleIncomingMessage -> Socket::eraseFirstInputBytes, our known 'friendly issue' (removed shutdown on POST file req ..)
now you beat me to it @caolanm :)
right .. have to check whether content-length in parseHeader is properly validated .. AFAIK it bails-out if having less bytes available.
Edit2: StreamSocket::parseHeader code checks available < contentLength
and sets map._messageSize += contentLength
.
still like to reproduce the issue first, will check with the mentioned drupal setup.
so the sequence from parseHeader .. seems to match (somehow buggy: header size 396 and content-length -1 (invalid)):
wsd-3223553-3223622 2024-09-13 09:51:52.467268 -0400 [ websrv_poll ] INF #37: parseHeader: Client HTTP Request: GET, uri `/browser/44cbb47116/branding.css` HTTP/1.1, sz[header 396, content -1], offset 396, chunked false, Host: 172.18.2.14:9980 / User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:130.0) Gecko/20100101 Firefox/130.0 / Accept: text/css,*/*;q=0.1 / Accept-Language: en-CA,en-US;q=0.7,en;q=0.3 / Accept-Encoding: gzip, deflate, br, zstd / DNT: 1 / Connection: keep-alive / Sec-Fetch-Dest: style / Sec-Fetch-Mode: no-cors / Sec-Fetch-Site: same-origin / Priority: u=2| net/Socket.cpp:1340
wsd-3223553-3223622 2024-09-13 09:51:52.467274 -0400 [ websrv_poll ] DBG #37: Handling request: /browser/44cbb47116/branding.css| wsd/ClientRequestDispatcher.cpp:670
wsd-3223553-3223622 2024-09-13 09:51:52.467282 -0400 [ websrv_poll ] TRC Fileserver request: /browser/44cbb47116/branding.css| wsd/FileServer.cpp:522
wsd-3223553-3223622 2024-09-13 09:51:52.467312 -0400 [ websrv_poll ] ERR FileServerRequestHandler: File not found: Invalid URI request (hash): [/browser/44cbb47116/branding.css].| wsd/FileServer.cpp:751
wsd-3223553-3223622 2024-09-13 09:51:52.467335 -0400 [ websrv_poll ] TRC #37: Wrote 323 bytes of 323 buffered data| net/Socket.hpp:1490
wsd-3223553-3223622 2024-09-13 09:51:52.467350 -0400 [ websrv_poll ] ERR #37: Attempted to remove: 396 which is > size: 0 clamped to 0| net/Socket.hpp:1264
Edit: So this eraseFirstInputBytes() probably comes from
// If we succeeded - remove the request header from our input buffer
socket->eraseFirstInputBytes(map._headerSize);
since the latter one map._messageSize - map._headerSize
should be zero and not show such a message?
Edit3: Yes, b/c parseHeader sets map._messageSize = map._headerSize
initially and since contentLength == -1
it won't get added to map._messageSize
. So 0 == map._messageSize - map._headerSize
. Hence the 1st socket->eraseFirstInputBytes(map._headerSize)
call triggered.
Edit2:
Edit3: brings me back to need to reproduce this issue first .. will continue.
The StreamScoket::eraseFirstInputBytes error log occurs as described above due do the already cleared _inBuffer
.
The latter happened due to FileServerRequestHandler::handlerRequest() line 751 (File not found /browser/6e7c0cb589/branding.css
) which can be reproduced in the standalone make run
.
Here HttpHelper::sendErrorAndShutdown
gets called which signals socket for async shutdown and ignores input, also clearing the _inBuffer
.
Either we could silence this misleading log or have the caller ClientRequestDispatcher::handleIncomingMessage
determine whether the socket is already about to shutdown and/or input-ignored. The latter is not so clear in current Socket implementation.
However, the above does not explain the 2min lag, which I also could not reproduce with given tests.
Tried to test with https://github.com/CollaboraOnline/collabora-drupal/
The website encountered an unexpected error. Try again later.
wsd-3578939-3578966 2024-09-16 12:30:09.020552 +0200 [ websrv_poll ] DBG #19: Handling request: //hosting/discovery| wsd/ClientRequestDispatcher.cpp:670
wsd-3578939-3578966 2024-09-16 12:30:09.020582 +0200 [ websrv_poll ] DBG #19: Wopi discovery request: //hosting/discovery| wsd/ClientRequestDispatcher.cpp:1023
wsd-3578939-3578966 2024-09-16 12:30:09.020650 +0200 [ websrv_poll ] TRC #19: Sending back discovery.xml: <wopi-discovery>
<net-zone name="external-http">
....
</wopi-discovery>| wsd/ClientRequestDispatcher.cpp:1043
wsd-3578939-3578966 2024-09-16 12:30:09.020730 +0200 [ websrv_poll ] INF #19: Sent discovery.xml successfully.| wsd/ClientRequestDispatcher.cpp:1045
wsd-3578939-3578966 2024-09-16 12:32:09.138724 +0200 [ websrv_poll ] DBG #19: Closed after reading| net/Socket.hpp:1375
wsd-3578939-3578966 2024-09-16 12:32:09.138754 +0200 [ websrv_poll ] TRC #19: Closed. Firing onDisconnect.| net/Socket.hpp:1447
@hfiguiere .. looks like this is your 2 minute lag ..
yes, drupal seems to load the discovery from COOL but for some reason 2 minutes are used (timeout) between successfully sent the data to closing the connection.
Could it be that the drupal code reads until close instead of using content-length
(as described earlier)?
Our code change keeps the connection open, hence content-length
should be utilized.
Looks like the root cause might be caused by https://github.com/CollaboraOnline/collabora-drupal/pull/29/files#diff-bb749b39143d95341efb10d136792f64797f016b12158902a9ec1dddcf562cddR28
Applying the change from the above patch on the Drupal module side no longer trigger the problem.
Looks like the root cause might be caused by https://github.com/CollaboraOnline/collabora-drupal/pull/29/files#diff-bb749b39143d95341efb10d136792f64797f016b12158902a9ec1dddcf562cddR28
They wrote ..
it can happen that file_get_contents() will hang at the end of a stream, expecting more data. With curl, this does not happen.
indicating that our keep alive connections change indeed were the cause,
i.e. not expecting more data but we didn't close and the drupal client also didn't close
after receiving the data. With curl
this doesn't happen...
Great .. we might find more client issues with this behavioral change.
IMHO this issue has been fully understood now and we can close it? OK?
IMHO this issue has been fully understood now and we can close it? OK?
We need to make sure it's not triggered.
BTW file_get_contents()
is not Drupal, it's PHP.
BTW
file_get_contents()
is not Drupal, it's PHP.
i.e. file_get_contents() .. thx.
We identified the request as an HTTP/1.1 GET
parseHeader: Client HTTP Request: GET, uri
//hosting/discovery
HTTP/1.1, sz[header 77, content -1], offset 77, chunked false, Host: localhost:9980 / Connection: close| net/Socket.cpp:1340
and indeed it looks like we do not respect Connection: close
, i.e. we do not close it.
The current workaround is functional, but as you hinted, expectation is different with these APIs (not using content-length but a closed connection). Will provide a patch, thank you!
here my pull request .. resolves the issue for me here. https://github.com/CollaboraOnline/online/pull/10054
Describe the Bug
Using the drupal integration (I don't have any other working).
The COOL
iframe
takes exactly 2 minutes to open (stuck in "load" not displaying anything), vs instantly.I tracked it down to change I83c09e0a8101931efece42e42e5a8e3abd7e696f that is PR #9777
Steps to Reproduce
Expected Behavior
Loads instantly.
Actual Behavior
It takes minutes to open. The browser (Firefox) stay in a "loading" state.
Screenshots
n/a
Desktop
(Please complete the following information)
Smartphone
(Please complete the following information)
Additional Context
The drupal server is on localhost:8443 and run inside a Docker container. COOL runs with
make run
on localhost:9980