coreos / docs

Documentation for CoreOS projects
http://coreos.com/docs
Apache License 2.0
882 stars 534 forks source link

os: add user-docs on how to create and host custom torcx remotes #1262

Open lucab opened 5 years ago

lucab commented 5 years ago

This documents how to create and host a custom Torcx remote, and shows the contents and layout of a sample one.

lucab commented 5 years ago

Do you have logs showing where the context expires? There is a 1 minute timeout to avoid getting stuck forever in downloads on a broken network, but it should only cover up to HTTP headers reception, so I am surprised it expires in your environment. Server side, is any HTTP content actually transferred before the timeout expires?

dm0- commented 5 years ago

This is all the service log contains:

Apr 18 13:21:38 localhost systemd[1]: Starting Populate torcx store to satisfy profile...
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="using next profile \"test\""
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /usr/share/oem/torcx/store/2107.0.0: no such file or directory" path=/usr/share/oem/torcx/store/2107.0.0
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /usr/share/oem/torcx/store: no such file or directory" path=/usr/share/oem/torcx/store
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /var/lib/torcx/store/2107.0.0: no such file or directory" path=/var/lib/torcx/store/2107.0.0
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /var/lib/torcx/store: no such file or directory" path=/var/lib/torcx/store
Apr 18 13:22:39 localhost chroot[369]: time="2019-04-18T13:22:39Z" level=error msg="context deadline exceeded"
Apr 18 13:22:39 localhost systemd[1]: torcx-profile-populate.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 13:22:39 localhost systemd[1]: torcx-profile-populate.service: Failed with result 'exit-code'.
Apr 18 13:22:39 localhost systemd[1]: Failed to start Populate torcx store to satisfy profile.

I can see some larger data packets being sent over HTTPS. It works fine with curl https://users.developer.core-os.net/dm0/torcxremote/torcx_remote_contents.json.asc on the system.

dm0- commented 5 years ago

Oh, it's not actually downloading anything. I saw it in a tcpdump, forgetting that I used the same remote file in the Ignition config, so that's what was downloading it. It looks like torcx just opens the TCP connection and closes it with no data transferred.

lucab commented 5 years ago

Meh, that smells like something not going as expected in the initramfs, I'll dig the code a bit more and see if I can reproduce (and somehow mitigate) in a normal rootfs. Just for reference, all the content I used for manually testing the torcx-remotes flow should be still up at gs://users.developer.core-os.net/lucab/torcx-remotes/.

dm0- commented 5 years ago

I get the same "context deadline exceeded" with your Ignition file, running in QEMU.