apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.11k stars 1.11k forks source link

Upgrade Cloudstack 4.13.1 to 4.17.2 #7820

Open Hossein4185 opened 1 year ago

Hossein4185 commented 1 year ago
ISSUE TYPE

Error

CLOUDSTACK VERSION

4.13.1

OS / ENVIRONMENT

CentOs 6

SUMMARY

I used to Cenos 6 and mysql 5.7, but I changed the database to Ubuntu 22 and MariaDB 10.6 and dump from 5.7 to Maria10.6 the service worked correctly, but I encountered the following problem when updating the service from 4.13.1 to 4.17.2.

Does cloud stack have problems with mariadb 10.6?

Update the description to indicate this only works with KVM + Ceph 2023-08-01 01:59:33,607 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- (not implemented properly atm for KVM+NFS/local, and it accidentaly works with XS + NFS. Not applicable for VMware) 2023-08-01 01:59:33,607 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.configuration SET description='Indicates whether to always backup primary storage snapshot to secondary storage. Keeping snapshots only on Primary storage is applicable for KVM + Ceph only.' WHERE name='snapshot.backup.to.secondary' 2023-08-01 01:59:33,609 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- KVM: enable storage data motion on KVM hypervisor_capabilities 2023-08-01 01:59:33,609 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.hypervisor_capabilities SET storage_motion_supported = 1 WHERE hypervisor_capabilities.hypervisor_type = 'KVM' 2023-08-01 01:59:33,610 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- Use 'Other Linux 64-bit' as guest os for the default systemvmtemplate for XenServer 2023-08-01 01:59:33,610 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.vm_template SET guest_os_id=99 WHERE id=1 2023-08-01 01:59:33,611 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- #3659 Fix typo: the past tense of shutdown is shutdown, not shutdowned 2023-08-01 01:59:33,611 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.vm_instance SET state='Shutdown' WHERE state='Shutdowned' 2023-08-01 01:59:33,612 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- Backup and Recovery 2023-08-01 01:59:33,612 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) CREATE TABLE IF NOT EXISTS cloud.backup_offering ( id bigint(20) unsigned NOT NULL AUTO_INCREMENT, uuid varchar(40) NOT NULL UNIQUE, name varchar(255) NOT NULL COMMENT 'backup offering name', description varchar(255) NOT NULL COMMENT 'backup offering description', external_id varchar(255) DEFAULT NULL COMMENT 'external ID on provider side', user_driven_backup tinyint(1) unsigned NOT NULL DEFAULT 0 COMMENT 'whether user can do adhoc backups and backup schedules allowed, default false', zone_id bigint(20) unsigned NOT NULL COMMENT 'zone id', provider varchar(255) NOT NULL COMMENT 'backup provider', created datetime DEFAULT NULL, removed datetime DEFAULT NULL, PRIMARY KEY (id), CONSTRAINT fk_backup_offering__zone_id FOREIGN KEY (zone_id) REFERENCES data_center (id) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 2023-08-01 01:59:33,646 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) ALTER TABLE cloud.vm_instance ADD COLUMN backup_offering_id bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering' 2023-08-01 01:59:33,686 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) Error executing: ALTER TABLE cloud.vm_instance ADD COLUMN backup_offering_id bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering' 2023-08-01 01:59:33,687 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id' 2023-08-01 01:59:33,689 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) (logid:) Unable to execute upgrade script java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id'

rajujith commented 1 year ago

Hi, @Hossein4185 was there any failed attempt to upgrade previously?

Hossein4185 commented 1 year ago

No, We had no failed updates.

weizhouapache commented 1 year ago

@Hossein4185 the error log seems to happen when you retry to upgrade cloudstack. is there error in your first try ?

tip: when upgrade fails, you need to restore the database from backup, otherwise, you might face similar issue like

2023-08-01 01:59:33,687 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id'
Hossein4185 commented 1 year ago

No, I get the same error in every attempt, even in the first attempt.

Even when I faced the problem, I restored the backup and tried again but I faced the same problem again

weizhouapache commented 1 year ago

@Hossein4185

can you restore the datbase, and check if vm_instance table has the column backup_offering_id ? the 4.13.1 database should not have the column.

No, I get the same error in every attempt, even in the first attempt.

Even when I faced the problem, I restored the backup and tried again but I faced the same problem again

Hossein4185 commented 1 year ago

We do not have the backup offering ID in the backup database

describe vm_instance; +--------------------------+---------------------+------+-----+--------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------------------+---------------------+------+-----+--------------+----------------+ | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(255) | NO | | NULL | | | uuid | varchar(40) | YES | UNI | NULL | | | instance_name | varchar(255) | NO | MUL | NULL | | | state | varchar(32) | NO | MUL | NULL | | | vm_template_id | bigint(20) unsigned | YES | MUL | NULL | | | guest_os_id | bigint(20) unsigned | NO | | NULL | | | private_mac_address | varchar(17) | YES | | NULL | | | private_ip_address | char(40) | YES | | NULL | | | pod_id | bigint(20) unsigned | YES | MUL | NULL | | | data_center_id | bigint(20) unsigned | NO | MUL | NULL | | | host_id | bigint(20) unsigned | YES | MUL | NULL | | | last_host_id | bigint(20) unsigned | YES | MUL | NULL | | | proxy_id | bigint(20) unsigned | YES | | NULL | | | proxy_assign_time | datetime | YES | | NULL | | | vnc_password | varchar(255) | NO | | NULL | | | ha_enabled | tinyint(1) | NO | | 0 | | | limit_cpu_use | tinyint(1) unsigned | NO | | 0 | | | update_count | bigint(20) unsigned | NO | MUL | 0 | | | update_time | datetime | YES | MUL | NULL | | | created | datetime | NO | | NULL | | | removed | datetime | YES | MUL | NULL | | | type | varchar(32) | NO | MUL | NULL | | | vm_type | varchar(32) | NO | | NULL | | | account_id | bigint(20) unsigned | NO | MUL | NULL | | | domain_id | bigint(20) unsigned | NO | | NULL | | | service_offering_id | bigint(20) unsigned | NO | MUL | NULL | | | reservation_id | char(40) | YES | | NULL | | | hypervisor_type | char(32) | YES | | NULL | | | disk_offering_id | bigint(20) unsigned | YES | | NULL | | | owner | varchar(255) | YES | | NULL | | | host_name | varchar(255) | YES | | NULL | | | display_name | varchar(255) | YES | | NULL | | | desired_state | varchar(32) | YES | | NULL | | | dynamically_scalable | tinyint(1) unsigned | NO | | 0 | | | display_vm | tinyint(1) | NO | | 1 | | | power_state | varchar(74) | YES | | PowerUnknown | | | power_state_update_time | datetime | YES | | NULL | | | power_state_update_count | int(11) | YES | | 0 | | | power_host | bigint(20) unsigned | YES | MUL | NULL | | | user_id | bigint(20) unsigned | NO | | 1 | | +--------------------------+---------------------+------+-----+--------------+----------------+

weizhouapache commented 1 year ago

We do not have the backup offering ID in the backup database

describe vm_instance; +--------------------------+---------------------+------+-----+--------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------------------+---------------------+------+-----+--------------+----------------+ | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(255) | NO | | NULL | | | uuid | varchar(40) | YES | UNI | NULL | | | instance_name | varchar(255) | NO | MUL | NULL | | | state | varchar(32) | NO | MUL | NULL | | | vm_template_id | bigint(20) unsigned | YES | MUL | NULL | | | guest_os_id | bigint(20) unsigned | NO | | NULL | | | private_mac_address | varchar(17) | YES | | NULL | | | private_ip_address | char(40) | YES | | NULL | | | pod_id | bigint(20) unsigned | YES | MUL | NULL | | | data_center_id | bigint(20) unsigned | NO | MUL | NULL | | | host_id | bigint(20) unsigned | YES | MUL | NULL | | | last_host_id | bigint(20) unsigned | YES | MUL | NULL | | | proxy_id | bigint(20) unsigned | YES | | NULL | | | proxy_assign_time | datetime | YES | | NULL | | | vnc_password | varchar(255) | NO | | NULL | | | ha_enabled | tinyint(1) | NO | | 0 | | | limit_cpu_use | tinyint(1) unsigned | NO | | 0 | | | update_count | bigint(20) unsigned | NO | MUL | 0 | | | update_time | datetime | YES | MUL | NULL | | | created | datetime | NO | | NULL | | | removed | datetime | YES | MUL | NULL | | | type | varchar(32) | NO | MUL | NULL | | | vm_type | varchar(32) | NO | | NULL | | | account_id | bigint(20) unsigned | NO | MUL | NULL | | | domain_id | bigint(20) unsigned | NO | | NULL | | | service_offering_id | bigint(20) unsigned | NO | MUL | NULL | | | reservation_id | char(40) | YES | | NULL | | | hypervisor_type | char(32) | YES | | NULL | | | disk_offering_id | bigint(20) unsigned | YES | | NULL | | | owner | varchar(255) | YES | | NULL | | | host_name | varchar(255) | YES | | NULL | | | display_name | varchar(255) | YES | | NULL | | | desired_state | varchar(32) | YES | | NULL | | | dynamically_scalable | tinyint(1) unsigned | NO | | 0 | | | display_vm | tinyint(1) | NO | | 1 | | | power_state | varchar(74) | YES | | PowerUnknown | | | power_state_update_time | datetime | YES | | NULL | | | power_state_update_count | int(11) | YES | | 0 | | | power_host | bigint(20) unsigned | YES | MUL | NULL | | | user_id | bigint(20) unsigned | NO | | 1 | | +--------------------------+---------------------+------+-----+--------------+----------------+

@Hossein4185 ok

that's a bit strange, the disk_offering_id is only added to the table once during upgrade

engine/schema/src/main/resources/META-INF/db/schema-41310to41400.sql:
ALTER TABLE `cloud`.`vm_instance` ADD COLUMN `backup_offering_id` bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering';
Hossein4185 commented 1 year ago

These may help to solve the problem. I plan to update to version 4.17.2 And I followed the steps below. I have 2ACS Ver 4.13.0 Server on CentOs 6 and mysql 5.7 on CentOs 6

1-update ACS Server to 14.13.1 because I want to add Two ACS Server on CentOs 7 2-change database to Mariadb 10.6 from mysql 5.7 we transfer with Mysqldump and The service was tested and worked properly CentOs 7 3-We Added two ACS Server CentOs 7 to existing Management Server. Because 4.17.2 not Supported CentOs 6 Server .The service was tested and worked properly CentOs 7 4-I wanted to update the service to 4.17.2, but I got an error.

All steps were tested in the lab and worked properly but I got an error in production.

DaanHoogland commented 1 year ago

@Hossein4185 , just a thought; did you start both management servers at the same time? You should wait for one to have done the upgrade, before starting the second one.

Hossein4185 commented 1 year ago

I followed the document below and stopped all management first and opdated one ACS and got an error https://docs.cloudstack.apache.org/en/4.17.2.0/upgrading/upgrade/upgrade-4.13.html

DaanHoogland commented 1 year ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

weizhouapache commented 1 year ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

could be

@Hossein4185 Could you dump the database without data of test environment and production, and check if there are differences ?

Hossein4185 commented 1 year ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

could be

@Hossein4185 Could you dump the database without data of test environment and production, and check if there are differences ?

@DaanHoogland @weizhouapache

I compared the two databases and there were differences, this problem started when I had two Centos6 management servers and Cloudstack 4.13.1 and added two Centos 7 servers with cloudstack 4.13.1 .But currently tables have been added to the database for 4.17.2. This is if I haven't had any failed attempts to update the service to 4.17.2 and now Centos 7 is working fine on 4.13.1

Hossein4185 commented 1 year ago

I deleted the exceptions using the following query and the 4.17.2 service was started. But I still don't know the reason for creating tables on version 4.13.1. I am testing the service...

drop table router_health_check ; drop table template_deploy_as_is_details ; drop table user_vm_deploy_as_is_details ; SET FOREIGN_KEY_CHECKS=0; drop table kubernetes_cluster ; SET FOREIGN_KEY_CHECKS=1; drop table kubernetes_cluster_vm_map; drop table kubernetes_supported_version; drop table resource_icon; drop table network_permissions; SET FOREIGN_KEY_CHECKS=0; drop table dc_ip6_guest_prefix; SET FOREIGN_KEY_CHECKS=1; drop table ip6_guest_prefix_subnet_network_map;

weizhouapache commented 1 year ago

@Hossein4185 your database might be broken after your failed attempt. When upgrade fails, you'd better restore database from backup. since your environment has been running for a period after that, it seems the only way to fix the problem is, changing the database tables manually to same as 4.13.1 database.

weizhouapache commented 1 year ago

@DaanHoogland @Hossein4185 I this this as not-a-bug and removed it from 4.18.1.0 milestone

rohityadavcloud commented 7 months ago

It may be possible that MariaDB isn't widely used, and some things are compatible with MySQL aren't compatible with MySQL. Is this still an issue for you @Hossein4185 ?