SUSE / suse-migration-services

GNU General Public License v3.0
7 stars 11 forks source link

Failed to migrate using DMS #197

Closed jaawasth closed 3 years ago

jaawasth commented 3 years ago

Hi Marcus,

As a continuation to our chat on the issue https://github.com/SUSE-Enceladus/azure-li-services/issues/266

filing a bug with this project. please see above issue on MS images for more details.

Thanks !!

schaefi commented 3 years ago

@jaawasth Hi Jai, would be great to start a debugging session about the DMS issue with the test system you have setup. I can arrange for a meeting on Fri 16th, Mon 19th, Thu 21st between 3-7pm o'clock CEST. If you sent a teams event via ms@suse.com it will land in my calendar

Thanks

schaefi commented 3 years ago

@jaawasth Thanks Jai for the debugging sessions. We have been able to fix all issues and will rebuild the DMS including the fixes. For customers to access the DMS packages (SLES15-Migration and suse-migration-sle15-activation) we need to run a new release to the SUSE namespace.

As this will take time I'm trying to speedup this process with maintenance.

I will do the submissions after we have clarity everything works as expected. Therefore I will provide the packages for the submission on the jump host from which you fetched the debugging versions too.

I'll let you know when they are there and you can do a final test without us attending, agreed ? On positive feedback I'll submit to SLES

Thanks

jaawasth commented 3 years ago

@schaefi, yes, this sounds good.

Thanks !!

schaefi commented 3 years ago

@jaawasth ok the hopefully final version of the updated DMS packages are available on the jump host we used for debugging. Please fetch them and give it a try:

$ rpm -e suse-migration-sle15-activation SLES15-Migration
$ rpm -Uhv suse-migration-sle15-activation-2.0.23-6.21.1.noarch.rpm SLES15-Migration-2.0.23-6.x86_64.rpm
$ reboot

Let us know if everything works. I'll keep this open until your feedback arrives.

Again thanks much for walking with us through the system :+1:

jaawasth commented 3 years ago

@schaefi , sure thanks, i'll be able to give it a try later today and let you know how it goes.

Thanks !!

jaawasth commented 3 years ago

@schaefi , the migration rpms worked fine and the system got upgraded, any particular file you want from the system to ensure completeness ?

schaefi commented 3 years ago

Great new, thanks for the feedback. I think I don't need further files. With that information I can now start the release of the DMS in SLES. Thanks this is a big step forward

jaawasth commented 3 years ago

@schaefi , we have 1 mroe environemnt where we are testing, and there are some issues there, can you please have a look at the logs. I'm not directly testing it, so I'll be taking a look at the systems today later in the day to find something missing here , attaching the logs below meanwhile distro_migration.log

jaawasth commented 3 years ago

this environemnt is different from where we tested this. Its on multipath but OS is installed on physical hard drives [rather than over FC] attached to the system in RAID-1

schaefi commented 3 years ago

Oh I think there is another error, introduced by the cert patch. Thanks for the log it shows

● suse-migration-prepare.service - Prepare For Migration
   Loaded: loaded (/usr/lib/systemd/system/suse-migration-prepare.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2021-04-22 19:45:23 UTC; 9s ago
  Process: 6137 ExecStart=/usr/bin/suse-migration-prepare (code=exited, status=1/FAILURE)
 Main PID: 6137 (code=exited, status=1/FAILURE)

Apr 22 19:45:23 localhost suse-migration-prepare[6137]:     trust_anchor
Apr 22 19:45:23 localhost suse-migration-prepare[6137]:   File "/usr/lib64/python3.6/shutil.py", line 245, in copy
Apr 22 19:45:23 localhost suse-migration-prepare[6137]:     copyfile(src, dst, follow_symlinks=follow_symlinks)
Apr 22 19:45:23 localhost suse-migration-prepare[6137]:   File "/usr/lib64/python3.6/shutil.py", line 120, in copyfile
Apr 22 19:45:23 localhost suse-migration-prepare[6137]:     with open(src, 'rb') as fsrc:
Apr 22 19:45:23 localhost suse-migration-prepare[6137]: FileNotFoundError: [Errno 2] No such file or directory: '/system-root/etc/pki/trust/anchors/rda-ca.pem'
Apr 22 19:45:23 localhost systemd[1]: suse-migration-prepare.service: Main process exited, code=exited, status=1/FAILURE
Apr 22 19:45:23 localhost systemd[1]: Failed to start Prepare For Migration.
Apr 22 19:45:23 localhost systemd[1]: suse-migration-prepare.service: Unit entered failed state.
Apr 22 19:45:23 localhost systemd[1]: suse-migration-prepare.service: Failed with result 'exit-code'.
schaefi commented 3 years ago

I'll take a look

schaefi commented 3 years ago

@jaawasth Hmm, I don't understand this. Our process does

os.listdir('/system-root/etc/pki/trust/anchors')

which returns rda-ca.pem and next we call

shutil.copy('/system-root/etc/pki/trust/anchors/rda-ca.pem', '/etc/pki/trust/anchors/')

which fails saying /system-root/etc/pki/trust/anchors/rda-ca.pem does not exist. Can you please check the contents of /etc/pki/trust/anchors on the host to upgrade and check if there is something special. Maybe this file is a symlink ?

Thanks

jaawasth commented 3 years ago

@schaefi , yes i wanted to have a look at this file but unfortunately i dont have access to the servers right now, only will be possible later in the day.

These files are from certs installed by a HW tool we need to run on the system. I'll try to get access to system and check these.

schaefi commented 3 years ago

Thanks Jai. I think it's important to understand why it failed. In the meantime I created a pull request that prevents the migration process from failing just because of the copy exception.

Nevertheless we need to understand why this happens and maybe come up with a better solution than just ignoring the error

Thanks

jaawasth commented 3 years ago

can you please provide a test rpm as well ?

jaawasth commented 3 years ago

@schaefi , correct this is a symlink.

can you please provide me the new test rpms , with latest change.

total 4 drwxr-xr-x 4 root root 38 Mar 10 2020 .. -rw-r--r-- 1 root root 1298 May 8 2020 rmt-server.pem drwxr-xr-x 2 root root 46 Nov 23 17:23 . lrwxrwxrwx 1 root root 27 Apr 19 16:04 rda-ca.pem -> /etc/rda/private/rda-ca.pem

schaefi commented 3 years ago

That explains everything. I update the PR and hope @jesusbv can have a short look

can you please provide me the new test rpms , with latest change.

I will once the open PR was reviewed

schaefi commented 3 years ago

@jaawasth Please find new packages on the jump host

Have a great weekend and stay safe

jaawasth commented 3 years ago

@schaefi , it failed again, can you please have a look

distro_migration.log

Thanks !!

jaawasth commented 3 years ago

This time i'm migrating with HW recommended software intact

We add an iso media using which we install required HW recommended packages on sles.

Problem retrieving files from 'SFS-2.24'. Failed to mount iso:/?iso=/var/foundation-2.24-cd1-media-sles12sp4-x86_64.iso on : Unable to find iso filename on source media History:

jaawasth commented 3 years ago

i had added this repo for installing harware packages, and it was not able to find the iso medium for the repo. I removed that and it worked fine. But can we add a check here as well to skip on such failures. ?

schaefi commented 3 years ago

@jaawasth I think this is one of the areas which are outside of the scope of the DMS (or any migration concept). The setup of the repositories (as remote or local settings) is a setup we value as the most important part of the system. If the DMS would ignore or add its own mechanism to deal with non reachable but configured software repositories, I would consider it a design mistake.

So the repo setup is considered to be there for a reason and the DMS is not eligible to mess with it.

i had added this repo for installing harware packages, and it was not able to find the iso medium for the repo. I removed that and it worked fine.

Exactly, and that action should be taken by somebody who know this is ok and will not harm.

Thoughts ?

jaawasth commented 3 years ago

sure, but from a migration perspective does a repo added by the end user matter ? we should not mess with the repo but is there a way to do it as a precheck during migration. So, if there is an external repo setup by the customer, we wanr him pre hand to remove those repos or add an option / flag to ignore those repos during migration ? Since, the end user doesnt know what errors are present it becomes a long process to identify what the error actually is, so do can we add alerts or ignore option ?

schaefi commented 3 years ago

This is a good point. We can certainly add sort of a repo pre-check in the activation/run-migration code such that a warning can be issued prior starting the process. I will add that as a new issue

jaawasth commented 3 years ago

thanks for tracking this.

jaawasth commented 3 years ago

@schaefi , when would the offical rpm's be available ?

schaefi commented 3 years ago

Hi Jai. The packages were released yesterday. the 2.0.24 versions of the packages should be available in the channels soon @aosthof Do you have more information when the packages will be actually available to customers ?

aosthof commented 3 years ago

@schaefi @jaawasth The packages have been flagged internally as 'done' with a release date of 2021-05-04 but the packages are not available yet in the official public repos (at the time of writing this). I could be that the mirroring process to CDN is still in progress. I assume that the packages will be available within the next couple of hours.

That said, I'm also checking with our Maintenance team to make sure nothing went wrong.

jaawasth commented 3 years ago

Thanks @aosthof @schaefi , please do keep me posted.

schaefi commented 3 years ago

@jaawasth The packages are now publicly available. Kudos to @aosthof who managed to get it published

jaawasth commented 3 years ago

Thanks both @schaefi & @aosthof for the help.