felddy / foundryvtt-docker

An easy-to-deploy Dockerized Foundry Virtual Tabletop server.
https://hub.docker.com/r/felddy/foundryvtt
MIT License
611 stars 118 forks source link

Unable to install modules #565

Closed dodgypast closed 1 year ago

dodgypast commented 1 year ago

Bug description

I get the following error in the console when I try to install a module: FoundryVTT | 2023-01-11 12:30:14 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/downloads/system-staging.json" did not provide system manifest data.

Steps to reproduce

This happens if I select install from the following dialogue: image

Expected behavior

I expect that the module will install successfully.

Container metadata

com.foundryvtt.version = "10.291"
org.opencontainers.image.authors = "markf+github@geekpad.com"
org.opencontainers.image.created = "2022-12-02T23:36:29.879Z"
org.opencontainers.image.description = "An easy-to-deploy Dockerized Foundry Virtual Tabletop server."
org.opencontainers.image.licenses = "MIT"
org.opencontainers.image.revision = "bf4e42e9570aab7e8de39d7f760cb5f16e19862e"
org.opencontainers.image.source = "https://github.com/felddy/foundryvtt-docker"
org.opencontainers.image.title = "foundryvtt-docker"
org.opencontainers.image.url = "https://github.com/felddy/foundryvtt-docker"
org.opencontainers.image.vendor = "Geekpad"
org.opencontainers.image.version = "10.291.0"

Relevant log output

Entrypoint | 2023-01-11 18:57:06 | [info] Foundry Virtual Tabletop 10.291 is installed.
Entrypoint | 2023-01-11 18:57:06 | [info] Not modifying existing installation license key.
Entrypoint | 2023-01-11 18:57:06 | [info] Setting data directory permissions.
Entrypoint | 2023-01-11 18:57:06 | [debug] Completed setting directory permissions.
Entrypoint | 2023-01-11 18:57:06 | [info] Starting launcher with uid:gid as foundry:foundry.
Launcher | 2023-01-11 18:57:06 | [debug] Ensuring /data/Config directory exists.
Launcher | 2023-01-11 18:57:06 | [info] Generating options.json file.
Launcher | 2023-01-11 18:57:07 | [info] Setting 'Admin Access Key'.
Launcher | 2023-01-11 18:57:07 | [info] Starting Foundry Virtual Tabletop.
FoundryVTT | 2023-01-11 11:57:07 | [info] Running on Node.js - Version 16.18.1
FoundryVTT | 2023-01-11 11:57:07 | [info] Foundry Virtual Tabletop - Version 10 Build 291
FoundryVTT | 2023-01-11 11:57:07 | [info] User Data Directory - "/data"
FoundryVTT | 2023-01-11 11:57:07 | [info] Application Options:
{
  "awsConfig": null,
  "compressStatic": true,
  "fullscreen": false,
  "hostname": null,
  "language": "en.core",
  "localHostname": null,
  "passwordSalt": null,
  "port": 30000,
  "protocol": null,
  "proxyPort": null,
  "proxySSL": false,
  "routePrefix": null,
  "sslCert": null,
  "sslKey": null,
  "updateChannel": "stable",
  "upnp": false,
  "upnpLeaseDuration": null,
  "world": null,
  "adminPassword": "••••••••••••••••",
  "serviceConfig": null
}
FoundryVTT | 2023-01-11 11:57:07 | [info] Software license verification succeeded
FoundryVTT | 2023-01-11 11:57:07 | [info] Server started and listening on port 30000
FoundryVTT | 2023-01-11 11:57:12 | [warn] Could not reach IP discovery service
FoundryVTT | 2023-01-11 11:57:36 | [info] Created client session 2381f78ce0f7e9cfec306dba
FoundryVTT | 2023-01-11 11:57:39 | [info] Created client session 90bf245526d5a6a41e48bd3e
FoundryVTT | 2023-01-11 11:57:44 | [info] Administrator authentication successful for session 90bf245526d5a6a41e48bd3e
FoundryVTT | 2023-01-11 11:58:14 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/downloads/system-staging.json" did not provide system manifest data.
Entrypoint | 2023-01-11 19:18:06 | [debug] Timezone set to: UTC
Entrypoint | 2023-01-11 19:18:06 | [info] Starting felddy/foundryvtt container v10.291.0
Entrypoint | 2023-01-11 19:18:06 | [debug] CONTAINER_VERBOSE set.  Debug logging enabled.
Entrypoint | 2023-01-11 19:18:06 | [info] Foundry Virtual Tabletop 10.291 is installed.
Entrypoint | 2023-01-11 19:18:06 | [info] Not modifying existing installation license key.
Entrypoint | 2023-01-11 19:18:06 | [info] Setting data directory permissions.
Entrypoint | 2023-01-11 19:18:06 | [debug] Completed setting directory permissions.
Entrypoint | 2023-01-11 19:18:06 | [info] Starting launcher with uid:gid as foundry:foundry.
Launcher | 2023-01-11 19:18:06 | [debug] Ensuring /data/Config directory exists.
Launcher | 2023-01-11 19:18:06 | [info] Generating options.json file.
Launcher | 2023-01-11 19:18:06 | [info] Setting 'Admin Access Key'.
Launcher | 2023-01-11 19:18:06 | [info] Starting Foundry Virtual Tabletop.
FoundryVTT | 2023-01-11 12:18:07 | [info] Running on Node.js - Version 16.18.1
FoundryVTT | 2023-01-11 12:18:07 | [info] Foundry Virtual Tabletop - Version 10 Build 291
FoundryVTT | 2023-01-11 12:18:07 | [info] User Data Directory - "/data"
FoundryVTT | 2023-01-11 12:18:07 | [info] Application Options:
{
  "awsConfig": null,
  "compressStatic": true,
  "fullscreen": false,
  "hostname": null,
  "language": "en.core",
  "localHostname": null,
  "passwordSalt": null,
  "port": 30000,
  "protocol": null,
  "proxyPort": null,
  "proxySSL": false,
  "routePrefix": null,
  "sslCert": null,
  "sslKey": null,
  "updateChannel": "stable",
  "upnp": false,
  "upnpLeaseDuration": null,
  "world": null,
  "adminPassword": "••••••••••••••••",
  "serviceConfig": null
}
FoundryVTT | 2023-01-11 12:18:07 | [info] Software license verification succeeded
FoundryVTT | 2023-01-11 12:18:07 | [info] Server started and listening on port 30000
FoundryVTT | 2023-01-11 12:18:12 | [warn] Could not reach IP discovery service
FoundryVTT | 2023-01-11 12:18:36 | [info] Created client session a83a0978d8bf52cad551fe0c
FoundryVTT | 2023-01-11 12:26:34 | [info] Created client session 9fcb16e221a7666cb7168f8c
FoundryVTT | 2023-01-11 12:26:34 | [info] Created client session ace04160ba3a369dd643471f
FoundryVTT | 2023-01-11 12:26:36 | [info] Administrator authentication successful for session ace04160ba3a369dd643471f
FoundryVTT | 2023-01-11 12:27:03 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/downloads/system-staging.json" did not provide system manifest data.
FoundryVTT | 2023-01-11 12:29:54 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/src/master/system-template.json" did not provide system manifest data.
FoundryVTT | 2023-01-11 12:30:14 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/downloads/system-staging.json" did not provide system manifest data.

Code of Conduct

dodgypast commented 1 year ago

I have got a module installing so I do not think this is an issue for the developer responsible for this container.

This thread on reddit made me think it might be something to do with the container. https://www.reddit.com/r/FoundryVTT/comments/y4oof6/unable_to_installupdate_most_modules_since_v10/

My apologies.

felddy commented 1 year ago

No problem. I'm glad you figured it out.

salsabeard commented 1 year ago

I'm not sure if it's possible, but I think we should re-open this. I am seeing this same issue when trying to install MANY systems, modules, and add-ons. One specifically that can be tested with is the Blade Runner RPG: https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt

The reason that I suggest reopening this issue is that when I was initially configuring my docker instance, I did so through the Synology Docker GUI tool. Once the container was up and running, I added that specific system and it worked just fine. I ran through my tests and then killed the container and reconfigured using docker-compose. I got it up and online and all seemed to be working, but that's when I noticed that I've barely been able to add any systems at all. But that's not to say none will work.

So far I've had no issues installing Cyberpunk Red, FATE, and Starfinder, but Blade Runner, DnD5e, and Call of Cthullu have all failed with similar to the following console output seen:

setup.js:292 Error: Unable to load valid system manifest data from "https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json" The requested manifest at "https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json" did not provide system manifest data. at Module.installPackage (file:///home/foundry/resources/app/dist/packages/views.mjs:1:1634) at runMicrotasks () at processTicksAndRejections (node:internal/process/task_queues:96:5) at async SetupView.handlePost (file:///home/foundry/resources/app/dist/server/views/setup.mjs:1:1980)

dodgypast commented 1 year ago

I am reopening this now as I installed the windows version and I'm not getting those arrors at all.

salsabeard commented 1 year ago

Alright, so I've been able to narrow this issue down further and I find it quite strange. I'm running 4 separate instances of this docker container on my Synology DSM with differing results.

Scenarios:

Results:

My initial testing had been done by creating the container in the GUI as a simple test and I'd followed the community guide found here [jump to "Setting up with Docker (no command line)" section]: https://foundryvtt.wiki/en/setup/hosting/Synology

I had noticed that the GUI configuration automatically creates additional environmental variables not seen in the Docker-Compose writeup, so I modified the yaml files to be identical to the conditions of the GUI containers (port configurations, mount points, environment variables, etc) and I'm still unable to add the majority of content to containers created via Docker-Compose.

Additional Note: The method of using the temporary URL for creating containers does not seem to be practical. Every time I recreate the container (say for update purposes) the temporary URL is no longer valid. It truly seems the best practice is to use the actual credentials for creating containers.

dodgypast commented 1 year ago

I'm using this docker via the unraid app store

felddy commented 1 year ago

I'll take a look at this and see if I can reproduce it.

Looking through the discord I think I see many reports of the same issue. I suspect that it is a rate-limiting issue with GitHub.

See:

salsabeard commented 1 year ago

I saw a lot of those comments as well, but I don’t think this is GitHub rate limiting for two reasons:

  1. Some GitHub downloads work consistently while others fail. Working installations can be added from GitHub repeatedly and work every time, while others fail consistently every time.
  2. I appear to only experience these issues when deploying via docker-compose. If I create the container from the Docker GUI on my Synology, the same installs will work. I can literally run both containers simultaneously and it will work on the GUI deployed container and fail on the DC deployed container.
salsabeard commented 1 year ago

Hey @felddy, just wanted to update once more and let you know I've confirmed the above. Last night I performed the following test:

At this point it seems abundantly clear that there is some sort of variance under the hood in how these deployment methods are working.

To try to help, I've exported the configuration from the GUI and am attaching it and the docker-compose.yaml files here. docker_files.zip

felddy commented 1 year ago

I haven't been able to reproduce these errors. I've installed several systems, modules, and worlds successfully.
I attempted to get rate-limited by hammering on the Update All buttons without error.

I've included the configuration information I used below.

Could you try adding NODE_DEBUG=http,net to your environment variables as described here, and in the docker-compose.yml file below:

This will configure node to emit a ton of information about the network requests.

With NODE_DEBUG defined, I expect that you'll see better information about the failure. Please post back anything interesting that you find.

Installations

ls -1 data-565/Data/*

data-565/Data/modules:
README.txt
dark-mode-5e/
dice-so-nice/
nice-more-dice/
polyglot/
tidy5e-sheet/
trs-foundryvtt-dice-set/

data-565/Data/systems:
README.txt
blade-runner/
dnd5e/
pf2e/
shadowrun6-eden/

data-565/Data/worlds:
README.txt
kobold-cauldron/

docker-compose.yml

---
# version: "3.8"

secrets:
  credentials:
    file: credentials-mine.json

services:
  foundry:
    image: felddy/foundryvtt:10.291.0
    hostname: felddy_foundryvtt
    init: true
    restart: "no"
    volumes:
      - type: bind
        source: ./data-565
        target: /data
    environment:
      - CONTAINER_VERBOSE=true
      - FOUNDRY_GID=20
      - FOUNDRY_UID=501
      - NODE_DEBUG=http,net
    secrets:
      - source: credentials
        target: config.json
    ports:
      - target: 30000
        published: 30000
        protocol: tcp
salsabeard commented 1 year ago

So it looks like there is a variance between the two instances at the NET level. In both containers, we reach the same HTTP socket close task, but the GUI initiates an afterConnect task whereas the Docker Compose instance destroys the connection. Unfortunately I know next to nothing about this level of architecture, so I can't really help much more than getting outputs.

Docker Synology GUI

HTTP 26: removeSocket github.com:443::::::::::::::::::::: writable: false HTTP 26: HTTP socket close NET 26: afterConnect NET 26: _read NET 26: Socket._handle.readStart HTTP 26: requestTimeout timer moved to req HTTP 26: AGENT incoming response! NET 26: _read NET 26: _read HTTP 26: AGENT socket.destroySoon() "NET 26: _final: not ended, call shutdown() NET 26: _read HTTP 26: call onSocket 0 0 HTTP 26: createConnection github.com:443::::::::::::::::::::: [Object: null prototype] { " protocol: 'https:', " hostname: 'github.com', " hash: '', " search: '', " pathname: '/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/blade-runner-fvtt_v10.0.2.zip',

Docker Compose

HTTP 79: removeSocket github.com:443::::::::::::::::::::: writable: false HTTP 79: HTTP socket close NET 79: destroy NET 79: close NET 79: close handle FoundryVTT | 2023-01-23 17:32:28 | [warn] The requested manifest at " https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json" did not provide system manifest data. HTTP 79: write ret = true HTTP 79: outgoing message end. NET 79: emit close HTTP 79: CLIENT socket onClose HTTP 79: removeSocket objects.githubusercontent.com:443::::::::::::::::::::: writable: false

felddy commented 1 year ago

So the Docker compose version isn't even making the createConnection call? Or was that cut off in the log?

I've got another idea that we could try. I've got a version on the container published with Node v18 in an attempt to resolve an IPv6 problem in issue #531 . Could you try testing with felddy/foundryvtt:node-18 ?

See:

salsabeard commented 1 year ago

I probably just cut it off. The logs from both were the same until that point specifically.  I’ll try the newer node version and see if it makes a difference.As a side note, in delving into this all I learned that Synology maintains the Docker application as well as the Docker Compose module. Both are version restricted based on the DSM version.  Are you testing with Docker on a Synology or something else? And what versions of Docker and DCompose (DSM as well if you’re using Synology)?-NickOn Jan 23, 2023, at 1:29 PM, Mark Feldhousen @.***> wrote: So the Docker compose version isn't even making the createConnection call? Or was that cut off in the log? I've got another idea that we could try. I've got a version on the container published with Node v18 in an attempt to resolve an IPv6 problem in issue #531 . Could you try testing with felddy/foundryvtt:node-18 ? See:

531

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Jnosh commented 1 year ago

Hi - I am encountering the same issue as well.

Running the container on a Synology via docker-compose (served through Traefik) and seeing The requested manifest at ... did not provide module manifest data.

I tried the felddy/foundryvtt:node-18 container and I'm still seeing the same issue - although subjectively it seems to maybe happen a bit less, hard to tell since there is a large amount of variance from attempt to attempt.

Log output from a run of felddy/foundryvtt:node-18 with CONTAINER_VERBOSE=true and NODE_DEBUG=http,net. I started the container and clicked Update All for the modules.

_FoundryVTT_logs.txt

Versions: Synology DSM 7.1.1-42962 Update 1 Docker version 20.10.3 docker-compose version 1.28.5

These are all the latest available stable versions. As mentioned by others, the Synology Docker releases usually trail the official ones a bit and include some Synology specific adjustments. They seem to release about 1-2 updates a year maybe.

FWIW I don't have any similar issues with any other docker containers running on the same NAS (about 25-30 in total) although I don't think any of the others are Node based.

salsabeard commented 1 year ago

Hey @felddy, sorry for the delay in getting back. I was finally able to give the node-18 release version a shot and it seems to be behaving the same way. First I purged out all of the remnant files, then I retooled the DC file to not have explicit version configurations and set the image to the node-18 release. Foundry came up like it always does and was showing the correct version.

I tested downloading the Blade Runner RPG (as it is always a failure case) and sure enough it failed again due to the manifest problem.

@Jnosh Just as a heads up, I've been able to run the same exact container image via the docker GUI in DSM without any issues. It's a super easy workaround because you can literally copy all of the same environment variables and point it to the existing folder structure and run it without losing anything you've created. You can literally just shutdown the GUI container and then start it back up via docker compose.

Jnosh commented 1 year ago

@salsabeard good to know, thanks!

valentinraul commented 1 year ago

Hello, I had the same error, in my case it was a docker on a raspberry 4. After much looking at all the log-debug docker to look at the raspberry syslog I saw that returned a DNS error. The error was because it had as first DNS server localhost, I have no idea why it was there but it was there, the rest of the servers were correct. When I modified it and set the DNS of my IPS I have no more errors when I click on update all the modules.

I had been having this error for a long time and every now and then I consulted to see if you had found the problem. Maybe this is not your case, but maybe the Synology does something similar.

I hope it helps you, greetings!

BeardedGnome commented 1 year ago

@felddy - I'm having the same issue. Not only with your docker, but others as well.

I've turned on all the logging and captured a success and failure against felddy/foundryvtt:release.

Failure to install Pathfinder 2e: https://pastebin.com/dZ25pd11

Successfully installed DnD 4e: https://pastebin.com/qDgj4vy9

I wonder if it has something to do with the number of redirections from the original manifest URL.

-Bearded Gnome

BeardedGnome commented 1 year ago

Retested with felddy/foundryvtt:node-18.

Same error for PF2e: https://pastebin.com/SQHj6izV

Also tried cURL from the container, it worked: https://pastebin.com/2v5tnTxq

The base URL: https://github.com/foundryvtt/pf2e/releases/latest/download/system.json gets redirected to https://github.com/foundryvtt/pf2e/releases/download/4.7.4/system.json

Back in Foundry, attempting a manual manifest install pointing directly to 4.7.4 also fails: https://pastebin.com/2kn41sUF

The 4.7.4 URL gets a redirect to a long, temporary githubusercontent URL. Which, if you manually enter it fast enough into the Manifest URL form, will be successful: https://pastebin.com/24tf5zPU

Speculation: The redirect from github.com to githubusercontent.com is failing for some reason.

-Bearded Gnome

DerLeole commented 1 year ago

Finally I found something regarding the series of bugs that plague my entire docker setup recently.

I have observed the same problems as everyone else commented here, but in addition my foundry docker also consistently fails to contact the foundry licensing signature servers, preventing me from starting the app at all recently.

I also noticed, anything accessing github hosted content can sometimes time out in compose created containers. For example, the following url seems to always time out in all of my docker containers, no matter the origin: https://raw.githubusercontent.com/CubeCoders/AMPTemplates/main/xonoticserver.cfg (it is a configuration file for a game server panel and just used as an example to show that this problem exists also outside of foundry).

DerLeole commented 1 year ago

So I have been digging around a bit and it seems like recently a lot of containers have started to show timeout errors when trying to reach resources on the outside they should 100% be able to reach and I have yet to find any solution or even official issue raised for this.

It could of course be coincidence, but it would explain a lot of problems I have had recently and fits this issue.

Some examples:

https://www.reddit.com/r/docker/comments/11541kv/need_a_docker_expert_for_a_weird_problem/

https://www.reddit.com/r/docker/comments/1137147/curl_request_error_52_from_docker_container_but/

https://www.reddit.com/r/docker/comments/10tvpjv/last_time_for_minecraft_forge_server/

BeardedGnome commented 1 year ago

Edited Foundry to print out the error from fetch:

foundryvtttest    | AbortError: The operation was aborted.
foundryvtttest    |     at abort (file:///opt/foundryvtt/resources/app/node_modules/node-fetch/src/index.js:70:18)
foundryvtttest    |     at AbortSignal.abortAndFinalize (file:///opt/foundryvtt/resources/app/node_modules/node-fetch/src/index.js:89:4)
foundryvtttest    |     at AbortSignal.dispatchEvent (/opt/foundryvtt/resources/app/node_modules/event-target-shim/dist/event-target-shim.js:818:35)
foundryvtttest    |     at abortSignal (/opt/foundryvtt/resources/app/node_modules/abort-controller/dist/abort-controller.js:52:12)
foundryvtttest    |     at AbortController.abort (/opt/foundryvtt/resources/app/node_modules/abort-controller/dist/abort-controller.js:91:9)
foundryvtttest    |     at Timeout._onTimeout (file:///opt/foundryvtt/resources/app/common/utils/http.mjs:23:18)
foundryvtttest    |     at listOnTimeout (node:internal/timers:559:17)
foundryvtttest    |     at processTimers (node:internal/timers:502:7) {
foundryvtttest    |   type: 'aborted'
foundryvtttest    | }

Still need to investigate the root cause.

BeardedGnome commented 1 year ago

Confirmed it's not Foundry/Node. Curl is taking 9 seconds in the container, 0.4 directly on the host.

Time to reach out to Docker support.

Jnosh commented 1 year ago

@BeardedGnome What did you test specifically? I can't reproduce any timeouts with curl in the foundry container with your example URLs:

time curl -L -s -o /dev/null https://github.com/foundryvtt/pf2e/releases/latest/download
/system.json
real    0m 0.65s
user    0m 0.05s
sys     0m 0.01s
DerLeole commented 1 year ago

I fixed it in my setup. It was the default docker MTU that was at fault.

By default the docker daemon has an MTU of 1500. However many network cards can have an MTU of less, and even if you have an MTU of 1500 on your main outside facing network interface on the host machine, some things like a VPN connecting, lower the effective MTU to something like 1420 (which was the case for me).

That means that packets that come from any docker container using the bridge network with its higher MTU have to be fragmented or result in fragmented return packages. These are often dropped by firewalls, resulting in the timeouts we see here. Services running directly of the host or containers using host networking are aware of the lowered MTU and thus work without problems.

So why would this only happen in compose for some? My best guess is that your default docker daeomon MTU is actually adjusted accordingly, but docker compose creates new default bridge networks for every service it starts and it doesn't use the daeomon's MTU parameter but defaults to a hardcoded 1500. This has to be adjusted for each compose network individually (or you create one overarching adjusted compose bridge and connect new containers to it instead of using a default network).

Below is a tutorial that explains the way how to adjust the MTUs for docker. You can check if it worked with "netstat -i" (please note that the docker0 interface shows MTU 1500 always, and only changes in netstat to your configured MTU when a container is connected to it).

https://www.civo.com/learn/fixing-networking-for-docker

salsabeard commented 1 year ago

Try pulling the Blade Runner system. That one has failed on me every time.Release 10.0.2 - Saint Nicholas 2037 · fvtt-fria-ligan/blade-runner-foundry-vttgithub.com-NickOn Feb 21, 2023, at 7:42 AM, Janosch Hildebrand @.***> wrote: @BeardedGnome What did you test specifically? I can't reproduce any timeouts with curl in the foundry container with your example URLs: time curl -L -s -o /dev/null https://github.com/foundryvtt/pf2e/releases/latest/download /system.json real 0m 0.65s user 0m 0.05s sys 0m 0.01s

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

BeardedGnome commented 1 year ago

@Jnosh - I used the instructions found here: https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl

Host

mdahrea@fatsia:~/foundryTest$ curl -L -w @curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json
     time_namelookup:  0.002033s
        time_connect:  0.026777s
     time_appconnect:  0.238662s
    time_pretransfer:  0.238941s
       time_redirect:  0.425002s
  time_starttransfer:  0.642463s
                     ----------
          time_total:  0.656259s

Container to PF2e

mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json"
[sudo] password for mdahrea:
     time_namelookup:  8.054977s
        time_connect:  8.080959s
     time_appconnect:  8.150718s
    time_pretransfer:  8.151287s
       time_redirect:  4.394603s
  time_starttransfer:  8.728799s
                     ----------
          time_total:  8.732226s

@salsabeard - I see the same behavior with Blade Runner:

Container to Blade Runner

mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json"
     time_namelookup:  8.006251s
        time_connect:  8.028398s
     time_appconnect:  8.080048s
    time_pretransfer:  8.080311s
       time_redirect:  4.191276s
  time_starttransfer:  8.393353s
                     ----------
          time_total:  8.393802s

@Leolele99 - I don't think it's a MTU issue for me.

mdahrea@fatsia:~/foundryTest$ netstat -i
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
br-47612  1500     6006      0      0 0         25288      0      0      0 BMRU
br-52168  1500   145047      0      0 0        176684      0      0      0 BMRU
br-58522  1500     2002      0      0 0          2310      0      0      0 BMU
br-68033  1500   223337      0      0 0        284573      0      0      0 BMRU
br-a19c1  1500    50610      0      0 0         54403      0      0      0 BMRU
br-ea940  1500      640      0      0 0           830      0      0      0 BMU
docker0   1500     5589      0      0 0         20954      0      0      0 BMU
enp2s0    1500  1223022      0      4 0       1633509      0      0      0 BMRU
lo       65536      632      0      0 0           632      0      0      0 LRU
veth8347  1500   223337      0      0 0        284616      0      0      0 BMRU
veth3828  1500   145047      0      0 0        176726      0      0      0 BMRU
veth3e28  1500      126      0      0 0           130      0      0      0 BMRU
vethcf42  1500    50610      0      0 0         54446      0      0      0 BMRU

From the timing reports out of curl it looks like a DNS lookup issue. When I left off I hadn't gotten dig to run in the container.

BeardedGnome commented 1 year ago

I'm 99% sure my specific issue is that I'm running my local DNS server (pi-hole) in a Docker container. By default, docker-compose creates a unique network for each container and they're not configured to connect to each other. It takes until the default docker DNS server (127.0.0.11) times out after 4 seconds before falling back to another DNS server and getting the answer.

The two look ups (github.com and githubusercontent.com) and two 4 second timeouts coincide well with a 9 second total time.

I've tested this theory by bringing up the Foundry container on the pi-hole network and specifying the pi-hole IP address as the DNS. Seems to solve the problem:

mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json"
     time_namelookup:  0.061142s
        time_connect:  0.083173s
     time_appconnect:  0.134501s
    time_pretransfer:  0.135124s
       time_redirect:  0.417493s
  time_starttransfer:  0.838467s
                     ----------
          time_total:  0.908495s

Now I need to determine the correct way to do this. Probably not by hard coding the IP address.

salsabeard commented 1 year ago

Quite an interesting find. I too am running a pi-hole container.-NickOn Feb 22, 2023, at 1:42 AM, Michael Dahrea @.> wrote: I'm 99% sure my specific issue is that I'm running my local DNS server (pi-hole) in a Docker container. By default, docker-compose creates a unique network for each container and they're not configured to connect to each other. It takes until the default docker DNS server (127.0.0.11) times out after 4 seconds before falling back to another DNS server and getting the answer. The two look ups (github.com and githubusercontent.com) and two 4 second timeouts coincide well with a 9 second total time. I've tested this theory by bringing up the Foundry container on the pi-hole network and specifying the pi-hole IP address as the DNS. Seems to solve the problem: @.:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json" time_namelookup: 0.061142s time_connect: 0.083173s time_appconnect: 0.134501s time_pretransfer: 0.135124s time_redirect: 0.417493s time_starttransfer: 0.838467s

      time_total:  0.908495s

Now I need to determine the correct way to do this. Probably not by hard coding the IP address.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

salsabeard commented 1 year ago

So I did some additional investigation on this and found that Docker Compose specifically handles DNS differently than other Docker deployment methods.  Docker run and the GUI will pull DNS entries from the /etc/resolve.conf file of the host.Dock compose on the other hand does not inherently update the container’s resolve.conf with the host’s file without explicit configuration. So it appears that it relies on the default configuration of 127.0.0.11 and attempting to resolve DNS via the configured gateway.The solution to this that I keep finding is to specify DNS servers in the docker compose field.-NickOn Feb 22, 2023, at 1:54 AM, Nick Saylor @.> wrote:Quite an interesting find. I too am running a pi-hole container.-NickOn Feb 22, 2023, at 1:42 AM, Michael Dahrea @.> wrote: I'm 99% sure my specific issue is that I'm running my local DNS server (pi-hole) in a Docker container. By default, docker-compose creates a unique network for each container and they're not configured to connect to each other. It takes until the default docker DNS server (127.0.0.11) times out after 4 seconds before falling back to another DNS server and getting the answer. The two look ups (github.com and githubusercontent.com) and two 4 second timeouts coincide well with a 9 second total time. I've tested this theory by bringing up the Foundry container on the pi-hole network and specifying the pi-hole IP address as the DNS. Seems to solve the problem: @.***:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json" time_namelookup: 0.061142s time_connect: 0.083173s time_appconnect: 0.134501s time_pretransfer: 0.135124s time_redirect: 0.417493s time_starttransfer: 0.838467s

      time_total:  0.908495s

Now I need to determine the correct way to do this. Probably not by hard coding the IP address.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

salsabeard commented 1 year ago

Alright, so I can confirm that this issue is specific to using pi-hole DNS container. Even when manually configured, it times out and fails. But if I changed to use external servers only, it works without any issue.-NickOn Feb 22, 2023, at 8:40 AM, Nick Saylor @.> wrote:So I did some additional investigation on this and found that Docker Compose specifically handles DNS differently than other Docker deployment methods.  Docker run and the GUI will pull DNS entries from the /etc/resolve.conf file of the host.Dock compose on the other hand does not inherently update the container’s resolve.conf with the host’s file without explicit configuration. So it appears that it relies on the default configuration of 127.0.0.11 and attempting to resolve DNS via the configured gateway.The solution to this that I keep finding is to specify DNS servers in the docker compose field.-NickOn Feb 22, 2023, at 1:54 AM, Nick Saylor @.> wrote:Quite an interesting find. I too am running a pi-hole container.-NickOn Feb 22, 2023, at 1:42 AM, Michael Dahrea @.> wrote: I'm 99% sure my specific issue is that I'm running my local DNS server (pi-hole) in a Docker container. By default, docker-compose creates a unique network for each container and they're not configured to connect to each other. It takes until the default docker DNS server (127.0.0.11) times out after 4 seconds before falling back to another DNS server and getting the answer. The two look ups (github.com and githubusercontent.com) and two 4 second timeouts coincide well with a 9 second total time. I've tested this theory by bringing up the Foundry container on the pi-hole network and specifying the pi-hole IP address as the DNS. Seems to solve the problem: @.:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json" time_namelookup: 0.061142s time_connect: 0.083173s time_appconnect: 0.134501s time_pretransfer: 0.135124s time_redirect: 0.417493s time_starttransfer: 0.838467s

      time_total:  0.908495s

Now I need to determine the correct way to do this. Probably not by hard coding the IP address.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Jnosh commented 1 year ago

It would seem that this error message can occur anytime the update checks fail or time out so there isn't a common cause, especially not one related to this docker image.

I checked the various DNS suggestions and none seemed to apply in my case... I finally tracked the issue in my case to Foundry issuing update checks in batches of 10 with this error message being returned if they do not finish within a set timeout.

It would seem that for my NAS hardware these parallel requests often simply do not finish quickly enough and are aborted once the timeout is reached. I verified by modifying the foundry frontend to serialize the update checks which leads to them completing successfully for me.

I have contacted Foundry support and hopefully they can increase the timeout, provide an option to serialize the update checks for people on weaker hardware or provide some other solution.

BeardedGnome commented 1 year ago

@salsabeard

Alright, so I can confirm that this issue is specific to using pi-hole DNS container. Even when manually configured, it times out and fails. But if I changed to use external servers only, it works without any issue. -Nick

Glad we agree on the symptom.

I haven't figured out how to add the correct dns entry to the yml file yet. Host IP doesn't work, docker0 IP doesn't work, host.docker.internal doesn't work.

Using the pihole IP only works when on the same network. Even then, only until the IP changes. Using the gateway IP works, until a reboot and it gets a new subnet. I guess I could specify the subnet for every container.

I could try putting all of my services into one yml file or predefining a network for them all to join.

I'm a little shocked I haven't been able to find a simple guide - 'how to' use container to container DNS. We can't be the first people to run into this issue.

salsabeard commented 1 year ago

@BeardedGnome You beat me to saying what I was going to say, I just hadn’t had a chance to validate new findings.

According to Docker’s official documentation regarding compose, a compose instance will build a default network (if an external network is not configured). This network is used for all services in the compose instance to communicate with one another, and they do so using their host names.

In my current configuration, my pihole and foundry containers are part of separate compose instances, and so foundry cannot communicate with the DNS server that is running in the other instance.

For my next test, I will be creating an explicitly configured “network” and binding all of my compose instances to it. If my understanding is correct, this should allow all of the compose services communicate with one another, regardless of their container instance. Good write up of this process here: https://stackoverflow.com/questions/38088279/communication-between-multiple-docker-compose-projects

Now all that being said, I’m coming to realize that this behavior is likely impacting ALL of my containerized externally reaching services that aren’t in the same container as my pihole instance, but the failover to the external DNS servers is never really an issue with them.

This is where I think the Foundry source code may need to be tweaked. As it seems to me, there isn’t enough fault tolerance to allow a DNS resolution timeout to occur with a primary server before essentially abandoning the download attempt and throwing an error.

Once I have a chance, I’ll get the new network configuration applied and report back.

salsabeard commented 1 year ago

I GOT IT!!! FINALLY!!!!!!!

TL;DR - The compose yaml needs to include " network_mode: 'bridge' " to allow the container to use the host network and not automatically create a separated network. Without this, you cannot use local DNS if it resides on the same server. @felddy can you check into this and update the example you have listed?

My last theory regarding the problems with inter-container communication was not exactly far off base, but it wasn't as on the money as I'd hoped.

I then started doing inspections on the containers and found a variance in behavior and inner working configuration from the Foundry compose container and every single other container on my server.

As for configuration, every container except for the one in that wasn't working has either host or bridge listed in the Networks field, but the container with problems was showing "dctest1-creds_default" indicating that it had created its own separate network for this container.

It then hit me like bricks and I realized that there is a difference between this bridged container and every other one I have on my server. I added the line " network_mode: "bridge" " to the docker compose yaml, and now the compose container is using the local DNS and is having absolutely no issues downloading any content.

BeardedGnome commented 1 year ago

This solution seems to be working for me: https://forums.docker.com/t/dns-issues-with-local-resolver-and-containers-on-the-same-host/102319

Advantage:

Disadvantage:

In my case the host IP address is static, so I can get away with this configuration.

Although, to be honest, I'm not really sure why that works.

Spartanaco commented 1 year ago

Had the same issue with modules not installing due to timeouts ultimately related to a complicated docker bridge (and local pihole) network configuration.

An alternative simple work around that I ended up using is to add a --dns 1.1.1.1 parameter (or any other public DNS server) to have the container use public DNS server explicitly. Curl timing test went from around ~8s to ~.4s Thanks for the lead and discussion! 🎉

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. If you are still experiencing problems, please open a new issue.