Closed dodgypast closed 1 year ago
I have got a module installing so I do not think this is an issue for the developer responsible for this container.
This thread on reddit made me think it might be something to do with the container. https://www.reddit.com/r/FoundryVTT/comments/y4oof6/unable_to_installupdate_most_modules_since_v10/
My apologies.
No problem. I'm glad you figured it out.
I'm not sure if it's possible, but I think we should re-open this. I am seeing this same issue when trying to install MANY systems, modules, and add-ons. One specifically that can be tested with is the Blade Runner RPG: https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt
The reason that I suggest reopening this issue is that when I was initially configuring my docker instance, I did so through the Synology Docker GUI tool. Once the container was up and running, I added that specific system and it worked just fine. I ran through my tests and then killed the container and reconfigured using docker-compose. I got it up and online and all seemed to be working, but that's when I noticed that I've barely been able to add any systems at all. But that's not to say none will work.
So far I've had no issues installing Cyberpunk Red, FATE, and Starfinder, but Blade Runner, DnD5e, and Call of Cthullu have all failed with similar to the following console output seen:
setup.js:292 Error: Unable to load valid system manifest data from "https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json"
The requested manifest at "https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json" did not provide system manifest data.
at Module.installPackage (file:///home/foundry/resources/app/dist/packages/views.mjs:1:1634)
at runMicrotasks (
I am reopening this now as I installed the windows version and I'm not getting those arrors at all.
Alright, so I've been able to narrow this issue down further and I find it quite strange. I'm running 4 separate instances of this docker container on my Synology DSM with differing results.
Scenarios:
Results:
My initial testing had been done by creating the container in the GUI as a simple test and I'd followed the community guide found here [jump to "Setting up with Docker (no command line)" section]: https://foundryvtt.wiki/en/setup/hosting/Synology
I had noticed that the GUI configuration automatically creates additional environmental variables not seen in the Docker-Compose writeup, so I modified the yaml files to be identical to the conditions of the GUI containers (port configurations, mount points, environment variables, etc) and I'm still unable to add the majority of content to containers created via Docker-Compose.
Additional Note: The method of using the temporary URL for creating containers does not seem to be practical. Every time I recreate the container (say for update purposes) the temporary URL is no longer valid. It truly seems the best practice is to use the actual credentials for creating containers.
I'm using this docker via the unraid app store
I'll take a look at this and see if I can reproduce it.
Looking through the discord I think I see many reports of the same issue. I suspect that it is a rate-limiting issue with GitHub.
See:
I saw a lot of those comments as well, but I don’t think this is GitHub rate limiting for two reasons:
Hey @felddy, just wanted to update once more and let you know I've confirmed the above. Last night I performed the following test:
At this point it seems abundantly clear that there is some sort of variance under the hood in how these deployment methods are working.
To try to help, I've exported the configuration from the GUI and am attaching it and the docker-compose.yaml files here. docker_files.zip
I haven't been able to reproduce these errors. I've installed several systems, modules, and worlds successfully.
I attempted to get rate-limited by hammering on the Update All
buttons without error.
I've included the configuration information I used below.
Could you try adding NODE_DEBUG=http,net
to your environment variables as described here, and in the docker-compose.yml
file below:
This will configure node to emit a ton of information about the network requests.
With NODE_DEBUG
defined, I expect that you'll see better information about the failure. Please post back anything interesting that you find.
ls -1 data-565/Data/*
data-565/Data/modules:
README.txt
dark-mode-5e/
dice-so-nice/
nice-more-dice/
polyglot/
tidy5e-sheet/
trs-foundryvtt-dice-set/
data-565/Data/systems:
README.txt
blade-runner/
dnd5e/
pf2e/
shadowrun6-eden/
data-565/Data/worlds:
README.txt
kobold-cauldron/
docker-compose.yml
---
# version: "3.8"
secrets:
credentials:
file: credentials-mine.json
services:
foundry:
image: felddy/foundryvtt:10.291.0
hostname: felddy_foundryvtt
init: true
restart: "no"
volumes:
- type: bind
source: ./data-565
target: /data
environment:
- CONTAINER_VERBOSE=true
- FOUNDRY_GID=20
- FOUNDRY_UID=501
- NODE_DEBUG=http,net
secrets:
- source: credentials
target: config.json
ports:
- target: 30000
published: 30000
protocol: tcp
So it looks like there is a variance between the two instances at the NET level. In both containers, we reach the same HTTP socket close task, but the GUI initiates an afterConnect task whereas the Docker Compose instance destroys the connection. Unfortunately I know next to nothing about this level of architecture, so I can't really help much more than getting outputs.
Docker Synology GUI
HTTP 26: removeSocket github.com:443::::::::::::::::::::: writable: false HTTP 26: HTTP socket close NET 26: afterConnect NET 26: _read NET 26: Socket._handle.readStart HTTP 26: requestTimeout timer moved to req HTTP 26: AGENT incoming response! NET 26: _read NET 26: _read HTTP 26: AGENT socket.destroySoon() "NET 26: _final: not ended, call shutdown() NET 26: _read HTTP 26: call onSocket 0 0 HTTP 26: createConnection github.com:443::::::::::::::::::::: [Object: null prototype] { " protocol: 'https:', " hostname: 'github.com', " hash: '', " search: '', " pathname: '/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/blade-runner-fvtt_v10.0.2.zip',
Docker Compose
HTTP 79: removeSocket github.com:443::::::::::::::::::::: writable: false HTTP 79: HTTP socket close NET 79: destroy NET 79: close NET 79: close handle FoundryVTT | 2023-01-23 17:32:28 | [warn] The requested manifest at " https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json" did not provide system manifest data. HTTP 79: write ret = true HTTP 79: outgoing message end. NET 79: emit close HTTP 79: CLIENT socket onClose HTTP 79: removeSocket objects.githubusercontent.com:443::::::::::::::::::::: writable: false
So the Docker compose version isn't even making the createConnection
call? Or was that cut off in the log?
I've got another idea that we could try. I've got a version on the container published with Node v18 in an attempt to resolve an IPv6 problem in issue #531 . Could you try testing with felddy/foundryvtt:node-18
?
See:
I probably just cut it off. The logs from both were the same until that point specifically. I’ll try the newer node version and see if it makes a difference.As a side note, in delving into this all I learned that Synology maintains the Docker application as well as the Docker Compose module. Both are version restricted based on the DSM version. Are you testing with Docker on a Synology or something else? And what versions of Docker and DCompose (DSM as well if you’re using Synology)?-NickOn Jan 23, 2023, at 1:29 PM, Mark Feldhousen @.***> wrote: So the Docker compose version isn't even making the createConnection call? Or was that cut off in the log? I've got another idea that we could try. I've got a version on the container published with Node v18 in an attempt to resolve an IPv6 problem in issue #531 . Could you try testing with felddy/foundryvtt:node-18 ? See:
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Hi - I am encountering the same issue as well.
Running the container on a Synology via docker-compose (served through Traefik) and seeing The requested manifest at ... did not provide module manifest data.
I tried the felddy/foundryvtt:node-18
container and I'm still seeing the same issue - although subjectively it seems to maybe happen a bit less, hard to tell since there is a large amount of variance from attempt to attempt.
Log output from a run of felddy/foundryvtt:node-18
with CONTAINER_VERBOSE=true
and NODE_DEBUG=http,net
. I started the container and clicked Update All
for the modules.
Versions:
Synology DSM 7.1.1-42962 Update 1
Docker version 20.10.3
docker-compose version 1.28.5
These are all the latest available stable versions. As mentioned by others, the Synology Docker releases usually trail the official ones a bit and include some Synology specific adjustments. They seem to release about 1-2 updates a year maybe.
FWIW I don't have any similar issues with any other docker containers running on the same NAS (about 25-30 in total) although I don't think any of the others are Node based.
Hey @felddy, sorry for the delay in getting back. I was finally able to give the node-18 release version a shot and it seems to be behaving the same way. First I purged out all of the remnant files, then I retooled the DC file to not have explicit version configurations and set the image to the node-18 release. Foundry came up like it always does and was showing the correct version.
I tested downloading the Blade Runner RPG (as it is always a failure case) and sure enough it failed again due to the manifest problem.
@Jnosh Just as a heads up, I've been able to run the same exact container image via the docker GUI in DSM without any issues. It's a super easy workaround because you can literally copy all of the same environment variables and point it to the existing folder structure and run it without losing anything you've created. You can literally just shutdown the GUI container and then start it back up via docker compose.
@salsabeard good to know, thanks!
Hello, I had the same error, in my case it was a docker on a raspberry 4. After much looking at all the log-debug docker to look at the raspberry syslog I saw that returned a DNS error. The error was because it had as first DNS server localhost, I have no idea why it was there but it was there, the rest of the servers were correct. When I modified it and set the DNS of my IPS I have no more errors when I click on update all the modules.
I had been having this error for a long time and every now and then I consulted to see if you had found the problem. Maybe this is not your case, but maybe the Synology does something similar.
I hope it helps you, greetings!
@felddy - I'm having the same issue. Not only with your docker, but others as well.
I've turned on all the logging and captured a success and failure against felddy/foundryvtt:release
.
Failure to install Pathfinder 2e: https://pastebin.com/dZ25pd11
Successfully installed DnD 4e: https://pastebin.com/qDgj4vy9
I wonder if it has something to do with the number of redirections from the original manifest URL.
-Bearded Gnome
Retested with felddy/foundryvtt:node-18
.
Same error for PF2e: https://pastebin.com/SQHj6izV
Also tried cURL from the container, it worked: https://pastebin.com/2v5tnTxq
The base URL:
https://github.com/foundryvtt/pf2e/releases/latest/download/system.json
gets redirected to
https://github.com/foundryvtt/pf2e/releases/download/4.7.4/system.json
Back in Foundry, attempting a manual manifest install pointing directly to 4.7.4 also fails: https://pastebin.com/2kn41sUF
The 4.7.4
URL gets a redirect to a long, temporary githubusercontent
URL. Which, if you manually enter it fast enough into the Manifest URL form, will be successful:
https://pastebin.com/24tf5zPU
Speculation: The redirect from github.com to githubusercontent.com is failing for some reason.
-Bearded Gnome
Finally I found something regarding the series of bugs that plague my entire docker setup recently.
I have observed the same problems as everyone else commented here, but in addition my foundry docker also consistently fails to contact the foundry licensing signature servers, preventing me from starting the app at all recently.
I also noticed, anything accessing github hosted content can sometimes time out in compose created containers. For example, the following url seems to always time out in all of my docker containers, no matter the origin: https://raw.githubusercontent.com/CubeCoders/AMPTemplates/main/xonoticserver.cfg (it is a configuration file for a game server panel and just used as an example to show that this problem exists also outside of foundry).
So I have been digging around a bit and it seems like recently a lot of containers have started to show timeout errors when trying to reach resources on the outside they should 100% be able to reach and I have yet to find any solution or even official issue raised for this.
It could of course be coincidence, but it would explain a lot of problems I have had recently and fits this issue.
Some examples:
https://www.reddit.com/r/docker/comments/11541kv/need_a_docker_expert_for_a_weird_problem/
https://www.reddit.com/r/docker/comments/1137147/curl_request_error_52_from_docker_container_but/
https://www.reddit.com/r/docker/comments/10tvpjv/last_time_for_minecraft_forge_server/
Edited Foundry to print out the error from fetch:
foundryvtttest | AbortError: The operation was aborted.
foundryvtttest | at abort (file:///opt/foundryvtt/resources/app/node_modules/node-fetch/src/index.js:70:18)
foundryvtttest | at AbortSignal.abortAndFinalize (file:///opt/foundryvtt/resources/app/node_modules/node-fetch/src/index.js:89:4)
foundryvtttest | at AbortSignal.dispatchEvent (/opt/foundryvtt/resources/app/node_modules/event-target-shim/dist/event-target-shim.js:818:35)
foundryvtttest | at abortSignal (/opt/foundryvtt/resources/app/node_modules/abort-controller/dist/abort-controller.js:52:12)
foundryvtttest | at AbortController.abort (/opt/foundryvtt/resources/app/node_modules/abort-controller/dist/abort-controller.js:91:9)
foundryvtttest | at Timeout._onTimeout (file:///opt/foundryvtt/resources/app/common/utils/http.mjs:23:18)
foundryvtttest | at listOnTimeout (node:internal/timers:559:17)
foundryvtttest | at processTimers (node:internal/timers:502:7) {
foundryvtttest | type: 'aborted'
foundryvtttest | }
Still need to investigate the root cause.
Confirmed it's not Foundry/Node. Curl is taking 9 seconds in the container, 0.4 directly on the host.
Time to reach out to Docker support.
@BeardedGnome What did you test specifically? I can't reproduce any timeouts with curl in the foundry container with your example URLs:
time curl -L -s -o /dev/null https://github.com/foundryvtt/pf2e/releases/latest/download
/system.json
real 0m 0.65s
user 0m 0.05s
sys 0m 0.01s
I fixed it in my setup. It was the default docker MTU that was at fault.
By default the docker daemon has an MTU of 1500. However many network cards can have an MTU of less, and even if you have an MTU of 1500 on your main outside facing network interface on the host machine, some things like a VPN connecting, lower the effective MTU to something like 1420 (which was the case for me).
That means that packets that come from any docker container using the bridge network with its higher MTU have to be fragmented or result in fragmented return packages. These are often dropped by firewalls, resulting in the timeouts we see here. Services running directly of the host or containers using host networking are aware of the lowered MTU and thus work without problems.
So why would this only happen in compose for some? My best guess is that your default docker daeomon MTU is actually adjusted accordingly, but docker compose creates new default bridge networks for every service it starts and it doesn't use the daeomon's MTU parameter but defaults to a hardcoded 1500. This has to be adjusted for each compose network individually (or you create one overarching adjusted compose bridge and connect new containers to it instead of using a default network).
Below is a tutorial that explains the way how to adjust the MTUs for docker. You can check if it worked with "netstat -i" (please note that the docker0 interface shows MTU 1500 always, and only changes in netstat to your configured MTU when a container is connected to it).
Try pulling the Blade Runner system. That one has failed on me every time.Release 10.0.2 - Saint Nicholas 2037 · fvtt-fria-ligan/blade-runner-foundry-vttgithub.com-NickOn Feb 21, 2023, at 7:42 AM, Janosch Hildebrand @.***> wrote: @BeardedGnome What did you test specifically? I can't reproduce any timeouts with curl in the foundry container with your example URLs: time curl -L -s -o /dev/null https://github.com/foundryvtt/pf2e/releases/latest/download /system.json real 0m 0.65s user 0m 0.05s sys 0m 0.01s
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@Jnosh - I used the instructions found here: https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl
mdahrea@fatsia:~/foundryTest$ curl -L -w @curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json
time_namelookup: 0.002033s
time_connect: 0.026777s
time_appconnect: 0.238662s
time_pretransfer: 0.238941s
time_redirect: 0.425002s
time_starttransfer: 0.642463s
----------
time_total: 0.656259s
mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json"
[sudo] password for mdahrea:
time_namelookup: 8.054977s
time_connect: 8.080959s
time_appconnect: 8.150718s
time_pretransfer: 8.151287s
time_redirect: 4.394603s
time_starttransfer: 8.728799s
----------
time_total: 8.732226s
@salsabeard - I see the same behavior with Blade Runner:
mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/fvtt-fria-ligan/blade-runner-foundry-vtt/releases/download/10.0.2/system.json"
time_namelookup: 8.006251s
time_connect: 8.028398s
time_appconnect: 8.080048s
time_pretransfer: 8.080311s
time_redirect: 4.191276s
time_starttransfer: 8.393353s
----------
time_total: 8.393802s
@Leolele99 - I don't think it's a MTU issue for me.
mdahrea@fatsia:~/foundryTest$ netstat -i
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
br-47612 1500 6006 0 0 0 25288 0 0 0 BMRU
br-52168 1500 145047 0 0 0 176684 0 0 0 BMRU
br-58522 1500 2002 0 0 0 2310 0 0 0 BMU
br-68033 1500 223337 0 0 0 284573 0 0 0 BMRU
br-a19c1 1500 50610 0 0 0 54403 0 0 0 BMRU
br-ea940 1500 640 0 0 0 830 0 0 0 BMU
docker0 1500 5589 0 0 0 20954 0 0 0 BMU
enp2s0 1500 1223022 0 4 0 1633509 0 0 0 BMRU
lo 65536 632 0 0 0 632 0 0 0 LRU
veth8347 1500 223337 0 0 0 284616 0 0 0 BMRU
veth3828 1500 145047 0 0 0 176726 0 0 0 BMRU
veth3e28 1500 126 0 0 0 130 0 0 0 BMRU
vethcf42 1500 50610 0 0 0 54446 0 0 0 BMRU
From the timing reports out of curl it looks like a DNS lookup issue. When I left off I hadn't gotten dig
to run in the container.
I'm 99% sure my specific issue is that I'm running my local DNS server (pi-hole) in a Docker container. By default, docker-compose creates a unique network for each container and they're not configured to connect to each other. It takes until the default docker DNS server (127.0.0.11) times out after 4 seconds before falling back to another DNS server and getting the answer.
The two look ups (github.com and githubusercontent.com) and two 4 second timeouts coincide well with a 9 second total time.
I've tested this theory by bringing up the Foundry container on the pi-hole network and specifying the pi-hole IP address as the DNS. Seems to solve the problem:
mdahrea@fatsia:~/foundryTest$ sudo docker exec foundryvtttest sh -c "curl -L -w @/host/curl-format.txt -o /dev/null -s https://github.com/foundryvtt/pf2e/releases/latest/download/system.json"
time_namelookup: 0.061142s
time_connect: 0.083173s
time_appconnect: 0.134501s
time_pretransfer: 0.135124s
time_redirect: 0.417493s
time_starttransfer: 0.838467s
----------
time_total: 0.908495s
Now I need to determine the correct way to do this. Probably not by hard coding the IP address.
time_total: 0.908495s
Now I need to determine the correct way to do this. Probably not by hard coding the IP address.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
time_total: 0.908495s
Now I need to determine the correct way to do this. Probably not by hard coding the IP address.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
time_total: 0.908495s
Now I need to determine the correct way to do this. Probably not by hard coding the IP address.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
It would seem that this error message can occur anytime the update checks fail or time out so there isn't a common cause, especially not one related to this docker image.
I checked the various DNS suggestions and none seemed to apply in my case... I finally tracked the issue in my case to Foundry issuing update checks in batches of 10 with this error message being returned if they do not finish within a set timeout.
It would seem that for my NAS hardware these parallel requests often simply do not finish quickly enough and are aborted once the timeout is reached. I verified by modifying the foundry frontend to serialize the update checks which leads to them completing successfully for me.
I have contacted Foundry support and hopefully they can increase the timeout, provide an option to serialize the update checks for people on weaker hardware or provide some other solution.
@salsabeard
Alright, so I can confirm that this issue is specific to using pi-hole DNS container. Even when manually configured, it times out and fails. But if I changed to use external servers only, it works without any issue. -Nick
Glad we agree on the symptom.
I haven't figured out how to add the correct dns entry to the yml file yet. Host IP doesn't work, docker0 IP doesn't work, host.docker.internal doesn't work.
Using the pihole IP only works when on the same network. Even then, only until the IP changes. Using the gateway IP works, until a reboot and it gets a new subnet. I guess I could specify the subnet for every container.
I could try putting all of my services into one yml file or predefining a network for them all to join.
I'm a little shocked I haven't been able to find a simple guide - 'how to' use container to container DNS. We can't be the first people to run into this issue.
@BeardedGnome You beat me to saying what I was going to say, I just hadn’t had a chance to validate new findings.
According to Docker’s official documentation regarding compose, a compose instance will build a default network (if an external network is not configured). This network is used for all services in the compose instance to communicate with one another, and they do so using their host names.
In my current configuration, my pihole and foundry containers are part of separate compose instances, and so foundry cannot communicate with the DNS server that is running in the other instance.
For my next test, I will be creating an explicitly configured “network” and binding all of my compose instances to it. If my understanding is correct, this should allow all of the compose services communicate with one another, regardless of their container instance. Good write up of this process here: https://stackoverflow.com/questions/38088279/communication-between-multiple-docker-compose-projects
Now all that being said, I’m coming to realize that this behavior is likely impacting ALL of my containerized externally reaching services that aren’t in the same container as my pihole instance, but the failover to the external DNS servers is never really an issue with them.
This is where I think the Foundry source code may need to be tweaked. As it seems to me, there isn’t enough fault tolerance to allow a DNS resolution timeout to occur with a primary server before essentially abandoning the download attempt and throwing an error.
Once I have a chance, I’ll get the new network configuration applied and report back.
I GOT IT!!! FINALLY!!!!!!!
TL;DR - The compose yaml needs to include " network_mode: 'bridge' " to allow the container to use the host network and not automatically create a separated network. Without this, you cannot use local DNS if it resides on the same server. @felddy can you check into this and update the example you have listed?
My last theory regarding the problems with inter-container communication was not exactly far off base, but it wasn't as on the money as I'd hoped.
I then started doing inspections on the containers and found a variance in behavior and inner working configuration from the Foundry compose container and every single other container on my server.
As for configuration, every container except for the one in that wasn't working has either host or bridge listed in the Networks field, but the container with problems was showing "dctest1-creds_default" indicating that it had created its own separate network for this container.
It then hit me like bricks and I realized that there is a difference between this bridged container and every other one I have on my server. I added the line " network_mode: "bridge" " to the docker compose yaml, and now the compose container is using the local DNS and is having absolutely no issues downloading any content.
This solution seems to be working for me: https://forums.docker.com/t/dns-issues-with-local-resolver-and-containers-on-the-same-host/102319
Advantage:
Disadvantage:
In my case the host IP address is static, so I can get away with this configuration.
Although, to be honest, I'm not really sure why that works.
Had the same issue with modules not installing due to timeouts ultimately related to a complicated docker bridge (and local pihole) network configuration.
An alternative simple work around that I ended up using is to add a --dns 1.1.1.1
parameter (or any other public DNS server) to have the container use public DNS server explicitly. Curl timing test went from around ~8s to ~.4s
Thanks for the lead and discussion! 🎉
This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed.
This issue has been automatically closed due to inactivity. If you are still experiencing problems, please open a new issue.
Bug description
I get the following error in the console when I try to install a module:
FoundryVTT | 2023-01-11 12:30:14 | [warn] The requested manifest at "https://bitbucket.org/rpgframework-cloud/shadowrun6-eden/downloads/system-staging.json" did not provide system manifest data.
Steps to reproduce
This happens if I select install from the following dialogue:
Expected behavior
I expect that the module will install successfully.
Container metadata
Relevant log output
Code of Conduct