FOGProject / fos

FOG Operating System
31 stars 33 forks source link

Multi-disk identification issues #27

Closed Quazz closed 4 years ago

Quazz commented 5 years ago

As per:

https://forums.fogproject.org/topic/13588/error-restoring-gpt-partition-tables/24

and

https://forums.fogproject.org/topic/13163/laptop-with-2-nvme-drives-randomly-selected-so-selecting-one-drive-to-capture-not-working

There are issues that can occur when a system has multiple drives. (or rather multiple storage controllers)

FOS attempts to grab "the first drive", (eg /dev/sda or /dev/nvme0) but this assignment is technically randomized. (which is why /etc/fstab will typically store an UUID to make it 'predictable')

Source: https://wiki.archlinux.org/index.php/Persistent_block_device_naming

We have to formulate a strategy for handling multi-disk systems, particularily for deploy.

Considerations to take into account:

This is off the top of my head, but essentially it means that we can't rely on the size of the captured partitions, nor the destination disks. (eg captured disk 1 is smaller than captured disk 2, but in deployment they want disk 1 on the bigger disk)

Quazz commented 5 years ago

Idea:

We could try and identify disks by their port numbers. (useful info here: https://askubuntu.com/questions/339232/identification-of-hdd-by-sata-port-number )

Although not sure how that works in relation to something like an NVME disk.

This should be a lot closer to the disks we expect at least.

edit: After further reading, disks attached to the same controller should have the same assignments (based on ports I assume), but disks attached to distinct controllers will fight for their spot. This also explains why we typically see this in systems with an (or multiple) NVME disks

Sebastian-Roth commented 4 years ago

@Quazz I pushed a commit that might address most of this issue. When I first added the disk size check (08ab73f) this was surely too static. Now I added some logic (4439247) that will check disk size and select the one that is the size of the source disk or bigger.

Do you think that will do it or should we still look into adding controller port identification?

Quazz commented 4 years ago

@Sebastian-Roth It's definitely a huge improvement. I don't think there's a blind automatic solution that will satisfy everyone, anyway. It's easy to see that this will run into problems if you for example created small virtual disks on a virtual machine for capture, since in every deployment every disk will be larger than every source disk for example.

That said, I don't really see a way around that. The cleanest logic I can think of is to simply check which disk 0 was in the capture and deploy it to disk 0 on target, but even that will fail if some of the computers are configured slightly differently and such so I think it's best not to focus on that and instead focus more on what we can consider a "good enough" approach that will work in all cases, even if the outcome isn't "ideal" in all cases.

Sebastian-Roth commented 4 years ago

@Quazz Are you still keen to look into this or should I close this issue for now? I'll definitely remove the bug label as this is solved for now. Further improvements might need to wait till we have released 1.5.8...

Quazz commented 4 years ago

@Sebastian-Roth I'd say close it for now and if people have more problems with this down the line we can revisit it.