balena-os / balena-supervisor

Balena Supervisor: balena's agent on devices.
https://balena.io
Other
147 stars 63 forks source link

Supervisor fails to unpack migrator backup.tgz #1214

Closed samothx closed 4 years ago

samothx commented 4 years ago

When migrating a device with backup on RPI3 with Supervisor 10.2.2 I get

Mar 09 21:18:08 419158e resin-supervisor[1397]: [info]    Migration backup detected
Mar 09 21:18:09 419158e 485578c91b22[1107]: [error]   Error restoring migration backup, retrying: Error: Command failed: tar -xzf backup.tgz -C /mnt/root/mnt/data/backup .
Mar 09 21:18:09 419158e resin-supervisor[1397]: [error]   Error restoring migration backup, retrying: Error: Command failed: tar -xzf backup.tgz -C /mnt/root/mnt/data/backup .

I assume this comes from src/lib/migrator.ts line 297 which looks OK to me.

tar -xzf backup.tgz -C /mnt/data/backup

works when tested on the device. I am a bit confused about where the trailing . in the above error message comes from.

tar -xzf backup.tgz -C /mnt/data/backup .

Does not work with the backups created by internal rust tar.

samothx commented 4 years ago

Did some more testing on this: When I provide a backup.tgz that works with

tar -xzf backup.tgz -C /mnt/data/backup .

by using external tar instead of rust internal tar I have a different error message:

journalctl -a | grep backup
Mar 09 23:16:46 3c5c723 fdd21b777d04[781]: [info]    Migration backup detected
Mar 09 23:16:46 3c5c723 resin-supervisor[949]: [info]    Migration backup detected
Mar 09 23:16:47 3c5c723 fdd21b777d04[781]: [debug]   Creating volume backup-vol from backup
Mar 09 23:16:47 3c5c723 resin-supervisor[949]: [debug]   Creating volume backup-vol from backup
Mar 09 23:16:47 3c5c723 fdd21b777d04[781]: [error]   Error restoring migration backup, retrying: Error: (HTTP code 404) no such volume - get 1391738_backup-vol: no such volume 
Mar 09 23:16:47 3c5c723 resin-supervisor[949]: [error]   Error restoring migration backup, retrying: Error: (HTTP code 404) no such volume - get 1391738_backup-vol: no such volume

So it seems for the above error the trailing . is indeed the culprit because backup extractions seems to work here. But then I see other error messages:

Mar 09 23:16:47 3c5c723 resin-supervisor[949]: [error]   Error restoring migration backup, retrying: Error: (HTTP code 404) no such volume - get 1391738_backup-vol: no such volume

where in the app a volume of that name indeed exists:

root@3c5c723:/mnt/data# balena volume ls
DRIVER              VOLUME NAME
local               1391738_backup-vol
local               1391738_resin-data
samothx commented 4 years ago

@CameronDiver what can I do to get this moving ?

CameronDiver commented 4 years ago

@samothx you have a branch which contains the change to the extractor code? I have a hunch about the other error you're seeing so I'd like to check it out.

How would I go about reproducing this myself? Do you have a premade image I could use?

samothx commented 4 years ago

@CameronDiver I have a lot of changes on my side too, I will set up something to test with for you and give you instructions.

samothx commented 4 years ago

@CameronDiver OK I have retested and surprisingly it works in with a balena-cloud-intel-nuc-2.50.1+rev image. Will test some more on other platforms..

samothx commented 4 years ago

@CameronDiver On Beaglebone-green running balenaOS 2.43.0+rev2 development Supervisor version 10.2.2 I got

Jul 08 23:46:19 beaglebone resin-supervisor[1125]: [info]    Migration backup detected
Jul 08 23:46:20 beaglebone resin-supervisor[1125]: [debug]   Finished applying target state
Jul 08 23:46:20 beaglebone resin-supervisor[1125]: [success] Device state apply success
Jul 08 23:46:23 beaglebone resin-supervisor[1125]: [info]    Internet Connectivity: OK
Jul 08 23:46:26 beaglebone resin-supervisor[1125]: [debug]   Creating volume backup-data from backup
Jul 08 23:46:26 beaglebone resin-supervisor[1125]: [event]   Event: Volume creation {}
Jul 08 23:46:26 beaglebone resin-supervisor[1125]: [error]   Error restoring migration backup, retrying: TypeError [ERR_INVALID_ARG_TYPE]: The "oldPath" argument must be one of type string, Buffer, or URL. Recei
Jul 08 23:46:56 beaglebone resin-supervisor[1125]: [info]    Migration backup detected
Jul 08 23:47:04 beaglebone resin-supervisor[1125]: [debug]   Creating volume backup-data from backup
Jul 08 23:47:04 beaglebone resin-supervisor[1125]: [event]   Event: Volume removal {}
Jul 08 23:47:04 beaglebone resin-supervisor[1125]: [event]   Event: Volume creation {}
Jul 08 23:47:04 beaglebone resin-supervisor[1125]: [error]   Error restoring migration backup, retrying: TypeError [ERR_INVALID_ARG_TYPE]: The "oldPath" argument must be one of type string, Buffer, or URL. Recei
Jul 08 23:47:34 beaglebone resin-supervisor[1125]: [info]    Migration backup detected
Jul 08 23:47:42 beaglebone resin-supervisor[1125]: [debug]   Creating volume backup-data from backup
Jul 08 23:47:42 beaglebone resin-supervisor[1125]: [event]   Event: Volume removal {}
Jul 08 23:47:42 beaglebone resin-supervisor[1125]: [event]   Event: Volume creation {}
Jul 08 23:47:42 beaglebone resin-supervisor[1125]: [error]   Error restoring migration backup, retrying: TypeError [ERR_INVALID_ARG_TYPE]: The "oldPath" argument must be one of type string, Buffer, or URL. Recei
Jul 08 23:48:12 beaglebone resin-supervisor[1125]: [info]    Migration backup detected
Jul 08 23:48:19 beaglebone resin-supervisor[1125]: [debug]   Creating volume backup-data from backup
Jul 08 23:48:20 beaglebone resin-supervisor[1125]: [event]   Event: Volume removal {}
Jul 08 23:48:20 beaglebone resin-supervisor[1125]: [event]   Event: Volume creation {}
Jul 08 23:48:20 beaglebone resin-supervisor[1125]: [error]   Error restoring migration backup, retrying: TypeError [ERR_INVALID_ARG_TYPE]: The "oldPath" argument must be one of type string, Buffer, or URL. Recei

will try RPI3 next

samothx commented 4 years ago

@CameronDiver regarding you testing - the easiest way is probably to fake the migration and just start a device with a backup (backup.tgz) in /mnt/data. Not quite sure if you only handle the backup on first boot or if you could just drop the file on an already initialized device. The idea is that the top level directories of the zip-file correspond to volumes - so you would need to have a container with a volume definition that corresponds to the top level directory name. Whatever is contained in the top-level directory should then end up in the volume. Other strategy would be to go the whole way with migration. This would migrate a device starting from a OS like raspian ubuntu and end up with a new balena-device. If you want to do that I can provide you with a setup for a device of your choosing. I like to do it on Generic-x86-64 devices using a virtualbox. I have a virtualbox setup with ubuntu and I usually make a clone of that which I then migrate to balena. Unfortuantely on the latest release this seems to work fine though.

samothx commented 4 years ago

@CameronDiver Tested on RPI3 - Host OS version balenaOS 2.51.1+rev1 development Supervisor version 11.4.10 - it seems to work here too.

CameronDiver commented 4 years ago

@samothx ok so the volume initialization didn't work with beaglebone-green, but it did with rpi3 and intel nuc? That's extremely strange. Also in your logs, it seems to cut some off, it would be interesting to see what the full error message is.

I think that @20k-ultra should take a look at this, I've been looking for a more in-depth debugging scenario for him to look into.

samothx commented 4 years ago

The beaglebone-green is running on 2.43.0+rev2 development Supervisor version 10.2.2, so that might be the problem.