Open lucab opened 5 years ago
Do you have logs showing where the context expires? There is a 1 minute timeout to avoid getting stuck forever in downloads on a broken network, but it should only cover up to HTTP headers reception, so I am surprised it expires in your environment. Server side, is any HTTP content actually transferred before the timeout expires?
This is all the service log contains:
Apr 18 13:21:38 localhost systemd[1]: Starting Populate torcx store to satisfy profile...
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="using next profile \"test\""
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /usr/share/oem/torcx/store/2107.0.0: no such file or directory" path=/usr/share/oem/torcx/store/2107.0.0
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /usr/share/oem/torcx/store: no such file or directory" path=/usr/share/oem/torcx/store
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /var/lib/torcx/store/2107.0.0: no such file or directory" path=/var/lib/torcx/store/2107.0.0
Apr 18 13:21:39 localhost chroot[369]: time="2019-04-18T13:21:39Z" level=info msg="store skipped" err="open /var/lib/torcx/store: no such file or directory" path=/var/lib/torcx/store
Apr 18 13:22:39 localhost chroot[369]: time="2019-04-18T13:22:39Z" level=error msg="context deadline exceeded"
Apr 18 13:22:39 localhost systemd[1]: torcx-profile-populate.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 13:22:39 localhost systemd[1]: torcx-profile-populate.service: Failed with result 'exit-code'.
Apr 18 13:22:39 localhost systemd[1]: Failed to start Populate torcx store to satisfy profile.
I can see some larger data packets being sent over HTTPS. It works fine with curl https://users.developer.core-os.net/dm0/torcxremote/torcx_remote_contents.json.asc
on the system.
Oh, it's not actually downloading anything. I saw it in a tcpdump, forgetting that I used the same remote file in the Ignition config, so that's what was downloading it. It looks like torcx just opens the TCP connection and closes it with no data transferred.
Meh, that smells like something not going as expected in the initramfs, I'll dig the code a bit more and see if I can reproduce (and somehow mitigate) in a normal rootfs. Just for reference, all the content I used for manually testing the torcx-remotes flow should be still up at gs://users.developer.core-os.net/lucab/torcx-remotes/
.
I get the same "context deadline exceeded" with your Ignition file, running in QEMU.
This documents how to create and host a custom Torcx remote, and shows the contents and layout of a sample one.