cloud-py-api / app_api

Nextcloud AppAPI
https://apps.nextcloud.com/apps/app_api
GNU Affero General Public License v3.0
61 stars 6 forks source link

After deployment of ExApp "Test Deploy" (nc_app_test-deploy) returns/shows: "Heartbeat check failed" and "Healtchecking" #300

Open architectonio opened 1 month ago

architectonio commented 1 month ago

Describe the bug

After having deployed the ExApp "Test Deploy", the NextCloud External App Admin Interface shows a "Healthchecking"infinite loop as well as "Heartbeat check failed"

Steps/Code to Reproduce

Deploy the "Test Deploy" on NextCloud

Expected Results

Deployed without any issue

Actual Results

NextCloud External App Admin Interface shows a "Healthchecking"infinite loop as well as "Heartbeat check failed"

Setup configuration

Software

Hardware

result of: docker logs nc_app_test-deploy Started INFO: Started server process [1] INFO: Waiting for application startup. TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}} TRACE: ASGI [1] Receive {'type': 'lifespan.startup'} TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'} INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit) INFO: Shutting down INFO: Waiting for application shutdown. TRACE: ASGI [1] Receive {'type': 'lifespan.shutdown'} TRACE: ASGI [1] Send {'type': 'lifespan.shutdown.complete'} TRACE: ASGI [1] Completed INFO: Application shutdown complete. INFO: Finished server process [1] Started INFO: Started server process [1] INFO: Waiting for application startup. TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}} TRACE: ASGI [1] Receive {'type': 'lifespan.startup'} TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'} INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)

result of: docker volume inspect nc_app_test-deploy_data [ { "CreatedAt": "2024-05-28T10:36:06+02:00", "Driver": "local", "Labels": null, "Mountpoint": "/var/lib/docker/volumes/nc_app_test-deploy_data/_data", "Name": "nc_app_test-deploy_data", "Options": null, "Scope": "local" } ]

kyteinsky commented 1 month ago

Please provide the following in addition to the above:

  1. The relevant section of server logs, last 10 minutes or the error entries (found at data/nextcloud.log inside nextcloud's directory)
  2. The docker socket proxy's container logs
  3. A screenshot of the Test Deploy page with the Developers console open. The Dev console can be opened by pressing F12 or Ctrl+Shift+I in the browser.
architectonio commented 1 month ago

1) The NextCloud log file is little complicated. I have several clients (both Mobile, Windows, Linux...) connected with my NextCloud instance and since it is logging everything, last 10 minutes (even just 5) would be a huge file, which I need to "clean from sensitive information like user ids..."

2) I attached the docker socket proxy log.

3) Where is to find that page?

kyteinsky commented 1 month ago
  1. grep app_api data/nextcloud.log should be good enough
  2. I can't see it for some reason although I received an email. Did you delete it by chance?
  3. i. Go to /index.php/settings/admin/app_api ii. Click on "Test deploy" inside a dropdown menu for the docker socket proxy's daemon iii. Press F12 iv. Click on "Start Deploy test"
architectonio commented 1 month ago

The resulting Log File is about 60 MB (59944187 Jun 7 14:29 nc_appapi.log) and as I said it contains sensitive information. I am going to replace such sensitive information with dummy/fake information and then upload the log file

architectonio commented 1 month ago

Please find attache the log file

architectonio commented 1 month ago

What do you exactly need from Developer Console? There are a lot of tabs/Screens.....

architectonio commented 1 month ago

The screenshot, hoping it contains the information you need

kyteinsky commented 1 month ago

It looks like the attachments are missing for the log file and the screenshot.

What do you exactly need from Developer Console?

Sorry for the confusion. I'm looking for errors in the "Console" tab or the "Network" tab.

architectonio commented 1 month ago

Maybe github doesn't accept .png and .gz files?

kyteinsky commented 1 month ago

Seems to work for me. You can give it one more try by dragging the file in the text box or a link to the uploaded file elsewhere.

architectonio commented 1 month ago

I uploaded the files again... both in a zip file

architectonio commented 1 month ago

part 1/3

architectonio commented 1 month ago

part 2/3

architectonio commented 1 month ago

part 1/3

kyteinsky commented 1 month ago

@architectonio Nice that you managed to get context chat running. Did you check if this was solved as well?

For the attachments, I only see text messages this side (part 1/3, ...). You can use pastebin and imgur to upload the logs and screenshots and then paste a link here if the issue still persists.

architectonio commented 1 month ago

@kyteinsky I just created a zip file to download from my server. How can I send you the link (privately)?

architectonio commented 1 month ago

@architectonio Nice that you managed to get context chat running. Did you check if this was solved as well?

It seems to work, however when I make a question in the NextCloud Context Chat, after a while I get "Context Chat task for NextCloud Assistant has filed".

architectonio commented 1 month ago

The same happens by trying to generate image like "Draw a red Rose in a brown pot" with "NC Assistant Generate Image". "Assistant has filed" after about a minute

kyteinsky commented 1 month ago

Sure, send it over at kyteinsky@gmail.com, I'll attach the files here. The issues might indicate that the app_api setup is not correct. Did it work before with image generation?

architectonio commented 1 month ago

Check your mailbox!

architectonio commented 1 month ago

@kyteinsky Any finding in the log file?

kyteinsky commented 1 month ago

sorry again for the late reply.

The only relevant line in the log was:

... Error during request to ExApp context_chat_backend: cURL error 28: Failed to connect to context_chat_backend port 23001 after 134424 ms: Couldn't connect to server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://context_chat_backend:23001/loadSources ...  

The "134423 ms" part is interesting. Can you check the php timeout in your php.ini config and increase it if it is too low? A good value could be 1800 to 3000 seconds.

Also, I'd like to see the Test deploy modal. Please follow these steps to run a test deployment using AppAPI:

  1. Go to "/index.php/settings/admin/app_api"
  2. Create a deploy daemon if not done already (verify the connection here itself)
  3. Click on "Test deploy" in the actions menu
  4. Click on "Start deploy test"
  5. Send a screenshot of the browser or modal when done.
kyteinsky commented 1 month ago

Click on "Test deploy" in the actions menu

architectonio commented 1 month ago

No worries. I have already tried with Test Deploy. Below the screen shots.

screen-2024-06-14-09-12-50

screen-2024-06-14-09-13-21

architectonio commented 1 month ago

And here the ExApp logs:

Started INFO: Started server process [1] INFO: Waiting for application startup. TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}} TRACE: ASGI [1] Receive {'type': 'lifespan.startup'} TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'} INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit) INFO: Shutting down INFO: Waiting for application shutdown. TRACE: ASGI [1] Receive {'type': 'lifespan.shutdown'} TRACE: ASGI [1] Send {'type': 'lifespan.shutdown.complete'} TRACE: ASGI [1] Completed INFO: Application shutdown complete. INFO: Finished server process [1] Started INFO: Started server process [1] INFO: Waiting for application startup. TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}} TRACE: ASGI [1] Receive {'type': 'lifespan.startup'} TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'} INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23000 (Press CTRL+C to quit)

kyteinsky commented 1 month ago

Could be a network issue. Can you click on the deploy daemon (not on the 3 dots) and then on "Verify connection" ?

architectonio commented 1 month ago

"Daemon connection successful"

architectonio commented 1 month ago

This is what gives back "docker ps" CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8ef91dc5a85c ghcr.io/cloud-py-api/test-deploy-cuda:release "python3 main.py" 6 days ago Up 3 days (healthy) nc_app_test-deploy 6c441efca423 ghcr.io/nextcloud/context_chat_backend:2.1.1 "python3 main.py" 6 days ago Up 3 days nc_app_context_chat_backend 4d936805fe6e localai/localai:master-aio-gpu-nvidia-cuda-12 "/aio/entrypoint.sh" 12 days ago Up 3 days (healthy) 0.0.0.0:28890->8080/tcp, :::28890->8080/tcp local-ai 3d02fb1d6b04 ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release "/bin/bash start.sh" 2 weeks ago Up 3 days (healthy) 0.0.0.0:2375->2375/tcp, :::2375->2375/tcp nextcloud-appapi-dsp

architectonio commented 1 month ago

By the way, the reason why I couldn't attach any file in the comment seems to be related to Firefox. I tried with Chromium and it works.......strange

kyteinsky commented 1 month ago

Can you try to ping the ex-app's container (nc_app_test-deploy) from nextcloud's container?

The network configuration for the daemon might be wrong. Since you're using the AIO, network should be nextcloud-aio.

bigcat88 commented 1 month ago

We need 3 results from the docker inspect command, for Nextcloud(as it is in a container), DockerSocketProxyContainer and for TestDeploy container.

architectonio commented 1 month ago

The "NoScript" extension was the cause. Even the whole github is allowed to run JS, " …github-production-user-asset-6210df.s3.amazonaws.com" was still "Not Allowed".

Test: screen-2024-06-14-09-42-31

architectonio commented 1 month ago

We need 3 results from the docker inspect command, for Nextcloud(as it is in a container), DockerSocketProxyContainer and for TestDeploy container.

Here you go. NextCloud isn't running in a Docker Container. dockerinspect-testdeploy.txt dockerinspect-socket.txt

architectonio commented 1 month ago

And this is the docker inspect of "Context Chat Backend" which is also not working dockerinspect-contextchatbackend.txt

bigcat88 commented 1 month ago

I guess this is a reason: "NetworkMode": "bridge"

Ok, we need to move those Note about bridge from here: https://cloud-py-api.github.io/app_api/DeployConfigurations.html

image

to somewhere else to be more visible...

bigcat88 commented 1 month ago
  1. Remove test-deploy(there is a button on a TestDeploy page) and remove docker-socket-proxy
  2. Create in docker user define bridge(https://docs.docker.com/network/drivers/bridge/#differences-between-user-defined-bridges-and-the-default-bridge)
  3. Create docker-socket-proxy and specify in the network the newly created bridge
  4. Try TestDeploy after that
architectonio commented 1 month ago

Can you try to ping the ex-app's container (nc_app_test-deploy) from nextcloud's container?

The network configuration for the daemon might be wrong. Since you're using the AIO, network should be nextcloud-aio.

OK, I am going to figure out what and how create such Network and then, I guess, re-deploy the containers

architectonio commented 1 month ago
  1. Remove test-deploy(there is a button on a TestDeploy page) and remove docker-socket-proxy

    1. Create in docker user define bridge(https://docs.docker.com/network/drivers/bridge/#differences-between-user-defined-bridges-and-the-default-bridge)

    2. Create docker-socket-proxy and specify in the network the newly created bridge

    3. Try TestDeploy after that

Done. However Test Deploy is stucking on Heartbeat screen-2024-06-14-10-12-10

architectonio commented 1 month ago

and the docker-socket-proxy isn't longer listed with a "docker ps", just disappeared

architectonio commented 1 month ago

after restarting docker daemon "dsp" is back

architectonio commented 1 month ago

And here the result of the new "docker inspect ......" dockerinspect-testdeploy.txt dockerinspect-socket.txt

bigcat88 commented 1 month ago

ping host(or where the Nextcloud is):

ping test-deploy ping nextcloud-appapi-dsp

both containers(nc_app_test-deploy and dsp one) should be running

This is how to diagnose DNS resolving problem

architectonio commented 1 month ago

Seems that both aren't reachable....

ping test-deploy PING test-deploy.architectonio.net (84.170.215.125) 56(84) bytes of data. 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=1 ttl=63 time=0.629 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=2 ttl=63 time=0.998 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=3 ttl=63 time=1.16 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=4 ttl=63 time=1.17 ms ^C

ping nextcloud-appapi-dsp PING nextcloud-appapi-dsp.architectonio.net (84.170.215.125) 56(84) bytes of data. 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=1 ttl=63 time=1.22 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=2 ttl=63 time=0.997 ms 64 bytes from 84.170.215.125 (84.170.215.125): icmp_seq=3 ttl=63 time=0.920 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=4 ttl=63 time=0.879 ms 64 bytes from p54aad77d.dip0.t-ipconnect.de (84.170.215.125): icmp_seq=5 ttl=63 time=0.929 ms ^C

bigcat88 commented 1 month ago

This is what AppAPI does on a heartbeat step during TestDeploy(only in HTTP mode, in HTTPS with remote installation also is another one):

curl 'http://test-deploy:23000/heartbeat'

You can get port from oc_ex_apps table or just try with 23000, 23001, 23002 or 23003 - in most cases it will one of these ports

bigcat88 commented 1 month ago

from last testdeploy inspect:

            "Networks": {
                "nextcloud-aio": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "test-deploy",
                        "9b7f1cbb4b6c"
                    ],
                    "NetworkID": "5d26f704ae6c26bd2eb55e8b2389b040d36c19caec5e392e47732d9f795c9e64",
                    "EndpointID": "05e265f32d302232be1e01bffc654c5cdcad18ebdb7259cf49f173f6719d0a1d",
                    "Gateway": "172.19.0.1",
                    "IPAddress": "172.19.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:13:00:02",
                    "DriverOpts": null
                }
            }
        }

docker-socket-proxy inspect:

            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "3e828587134e852ff76006e8bbbc8281a930a6fc5ccb68312785928323bc4362",
                    "EndpointID": "3da0893301b8df218df541f86ce7f6832e629d7ecf9b1375d25d52d2d86b3a8b",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.8",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:08",
                    "DriverOpts": null
                }
            }
architectonio commented 1 month ago

This is what AppAPI does on a heartbeat step during TestDeploy(only in HTTP mode, in HTTPS with remote installation also is another one):

curl 'http://test-deploy:23000/heartbeat'

You can get port from oc_ex_apps table or just try with 23000, 23001, 23002 or 23003 - in most cases it will one of these ports

No answer on all ports Regarding oc_ex_apps, I do not know what it is at all... :-(

architectonio commented 1 month ago

Screenshot of from NextCloud Interface screen-2024-06-14-10-38-31

bigcat88 commented 1 month ago

VerifyConnection button works in your case only for the reason that you specified "/var/run/docker.sock" in Host

I suggest to remove this daemon, deploy this container https://github.com/cloud-py-api/docker-socket-proxy and create daemon with Host: nextcloud-appapi-dsp:2375 after that.

After that "VerifyConnection" button will try to connect to nextcloud-appapi-dsp:2375 which will fail, I guess...

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

architectonio commented 1 month ago

VerifyConnection button works in your case only for the reason that you specified "/var/run/docker.sock" in Host

I suggest to remove this daemon, deploy this container https://github.com/cloud-py-api/docker-socket-proxy and create daemon with Host: nextcloud-appapi-dsp:2375 after that.

After that "VerifyConnection" button will try to connect to nextcloud-appapi-dsp:2375 which will fail, I guess...

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

OK, I'll do as you suggested, but on Sunday in the late afternoon. Now I have to travel a little....

architectonio commented 1 month ago

Something is resolving all those DNS names in your system to 84.170.215.125 - you need to find what is that.

This is the Public IP Address I got (for today) by my ISP, on which points my domain