apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.83k stars 1.07k forks source link

Upgrade Cloudstack 4.13.1 to 4.17.2 #7820

Open Hossein4185 opened 11 months ago

Hossein4185 commented 11 months ago
ISSUE TYPE

Error

CLOUDSTACK VERSION

4.13.1

OS / ENVIRONMENT

CentOs 6

SUMMARY

I used to Cenos 6 and mysql 5.7, but I changed the database to Ubuntu 22 and MariaDB 10.6 and dump from 5.7 to Maria10.6 the service worked correctly, but I encountered the following problem when updating the service from 4.13.1 to 4.17.2.

Does cloud stack have problems with mariadb 10.6?

Update the description to indicate this only works with KVM + Ceph 2023-08-01 01:59:33,607 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- (not implemented properly atm for KVM+NFS/local, and it accidentaly works with XS + NFS. Not applicable for VMware) 2023-08-01 01:59:33,607 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.configuration SET description='Indicates whether to always backup primary storage snapshot to secondary storage. Keeping snapshots only on Primary storage is applicable for KVM + Ceph only.' WHERE name='snapshot.backup.to.secondary' 2023-08-01 01:59:33,609 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- KVM: enable storage data motion on KVM hypervisor_capabilities 2023-08-01 01:59:33,609 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.hypervisor_capabilities SET storage_motion_supported = 1 WHERE hypervisor_capabilities.hypervisor_type = 'KVM' 2023-08-01 01:59:33,610 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- Use 'Other Linux 64-bit' as guest os for the default systemvmtemplate for XenServer 2023-08-01 01:59:33,610 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.vm_template SET guest_os_id=99 WHERE id=1 2023-08-01 01:59:33,611 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- #3659 Fix typo: the past tense of shutdown is shutdown, not shutdowned 2023-08-01 01:59:33,611 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) UPDATE cloud.vm_instance SET state='Shutdown' WHERE state='Shutdowned' 2023-08-01 01:59:33,612 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- Backup and Recovery 2023-08-01 01:59:33,612 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) CREATE TABLE IF NOT EXISTS cloud.backup_offering ( id bigint(20) unsigned NOT NULL AUTO_INCREMENT, uuid varchar(40) NOT NULL UNIQUE, name varchar(255) NOT NULL COMMENT 'backup offering name', description varchar(255) NOT NULL COMMENT 'backup offering description', external_id varchar(255) DEFAULT NULL COMMENT 'external ID on provider side', user_driven_backup tinyint(1) unsigned NOT NULL DEFAULT 0 COMMENT 'whether user can do adhoc backups and backup schedules allowed, default false', zone_id bigint(20) unsigned NOT NULL COMMENT 'zone id', provider varchar(255) NOT NULL COMMENT 'backup provider', created datetime DEFAULT NULL, removed datetime DEFAULT NULL, PRIMARY KEY (id), CONSTRAINT fk_backup_offering__zone_id FOREIGN KEY (zone_id) REFERENCES data_center (id) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 2023-08-01 01:59:33,646 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) ALTER TABLE cloud.vm_instance ADD COLUMN backup_offering_id bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering' 2023-08-01 01:59:33,686 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) Error executing: ALTER TABLE cloud.vm_instance ADD COLUMN backup_offering_id bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering' 2023-08-01 01:59:33,687 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id' 2023-08-01 01:59:33,689 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) (logid:) Unable to execute upgrade script java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id'

rajujith commented 11 months ago

Hi, @Hossein4185 was there any failed attempt to upgrade previously?

Hossein4185 commented 11 months ago

No, We had no failed updates.

weizhouapache commented 10 months ago

@Hossein4185 the error log seems to happen when you retry to upgrade cloudstack. is there error in your first try ?

tip: when upgrade fails, you need to restore the database from backup, otherwise, you might face similar issue like

2023-08-01 01:59:33,687 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) java.sql.SQLSyntaxErrorException: Duplicate column name 'backup_offering_id'
Hossein4185 commented 10 months ago

No, I get the same error in every attempt, even in the first attempt.

Even when I faced the problem, I restored the backup and tried again but I faced the same problem again

weizhouapache commented 10 months ago

@Hossein4185

can you restore the datbase, and check if vm_instance table has the column backup_offering_id ? the 4.13.1 database should not have the column.

No, I get the same error in every attempt, even in the first attempt.

Even when I faced the problem, I restored the backup and tried again but I faced the same problem again

Hossein4185 commented 10 months ago

We do not have the backup offering ID in the backup database

describe vm_instance; +--------------------------+---------------------+------+-----+--------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------------------+---------------------+------+-----+--------------+----------------+ | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(255) | NO | | NULL | | | uuid | varchar(40) | YES | UNI | NULL | | | instance_name | varchar(255) | NO | MUL | NULL | | | state | varchar(32) | NO | MUL | NULL | | | vm_template_id | bigint(20) unsigned | YES | MUL | NULL | | | guest_os_id | bigint(20) unsigned | NO | | NULL | | | private_mac_address | varchar(17) | YES | | NULL | | | private_ip_address | char(40) | YES | | NULL | | | pod_id | bigint(20) unsigned | YES | MUL | NULL | | | data_center_id | bigint(20) unsigned | NO | MUL | NULL | | | host_id | bigint(20) unsigned | YES | MUL | NULL | | | last_host_id | bigint(20) unsigned | YES | MUL | NULL | | | proxy_id | bigint(20) unsigned | YES | | NULL | | | proxy_assign_time | datetime | YES | | NULL | | | vnc_password | varchar(255) | NO | | NULL | | | ha_enabled | tinyint(1) | NO | | 0 | | | limit_cpu_use | tinyint(1) unsigned | NO | | 0 | | | update_count | bigint(20) unsigned | NO | MUL | 0 | | | update_time | datetime | YES | MUL | NULL | | | created | datetime | NO | | NULL | | | removed | datetime | YES | MUL | NULL | | | type | varchar(32) | NO | MUL | NULL | | | vm_type | varchar(32) | NO | | NULL | | | account_id | bigint(20) unsigned | NO | MUL | NULL | | | domain_id | bigint(20) unsigned | NO | | NULL | | | service_offering_id | bigint(20) unsigned | NO | MUL | NULL | | | reservation_id | char(40) | YES | | NULL | | | hypervisor_type | char(32) | YES | | NULL | | | disk_offering_id | bigint(20) unsigned | YES | | NULL | | | owner | varchar(255) | YES | | NULL | | | host_name | varchar(255) | YES | | NULL | | | display_name | varchar(255) | YES | | NULL | | | desired_state | varchar(32) | YES | | NULL | | | dynamically_scalable | tinyint(1) unsigned | NO | | 0 | | | display_vm | tinyint(1) | NO | | 1 | | | power_state | varchar(74) | YES | | PowerUnknown | | | power_state_update_time | datetime | YES | | NULL | | | power_state_update_count | int(11) | YES | | 0 | | | power_host | bigint(20) unsigned | YES | MUL | NULL | | | user_id | bigint(20) unsigned | NO | | 1 | | +--------------------------+---------------------+------+-----+--------------+----------------+

weizhouapache commented 10 months ago

We do not have the backup offering ID in the backup database

describe vm_instance; +--------------------------+---------------------+------+-----+--------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------------------+---------------------+------+-----+--------------+----------------+ | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(255) | NO | | NULL | | | uuid | varchar(40) | YES | UNI | NULL | | | instance_name | varchar(255) | NO | MUL | NULL | | | state | varchar(32) | NO | MUL | NULL | | | vm_template_id | bigint(20) unsigned | YES | MUL | NULL | | | guest_os_id | bigint(20) unsigned | NO | | NULL | | | private_mac_address | varchar(17) | YES | | NULL | | | private_ip_address | char(40) | YES | | NULL | | | pod_id | bigint(20) unsigned | YES | MUL | NULL | | | data_center_id | bigint(20) unsigned | NO | MUL | NULL | | | host_id | bigint(20) unsigned | YES | MUL | NULL | | | last_host_id | bigint(20) unsigned | YES | MUL | NULL | | | proxy_id | bigint(20) unsigned | YES | | NULL | | | proxy_assign_time | datetime | YES | | NULL | | | vnc_password | varchar(255) | NO | | NULL | | | ha_enabled | tinyint(1) | NO | | 0 | | | limit_cpu_use | tinyint(1) unsigned | NO | | 0 | | | update_count | bigint(20) unsigned | NO | MUL | 0 | | | update_time | datetime | YES | MUL | NULL | | | created | datetime | NO | | NULL | | | removed | datetime | YES | MUL | NULL | | | type | varchar(32) | NO | MUL | NULL | | | vm_type | varchar(32) | NO | | NULL | | | account_id | bigint(20) unsigned | NO | MUL | NULL | | | domain_id | bigint(20) unsigned | NO | | NULL | | | service_offering_id | bigint(20) unsigned | NO | MUL | NULL | | | reservation_id | char(40) | YES | | NULL | | | hypervisor_type | char(32) | YES | | NULL | | | disk_offering_id | bigint(20) unsigned | YES | | NULL | | | owner | varchar(255) | YES | | NULL | | | host_name | varchar(255) | YES | | NULL | | | display_name | varchar(255) | YES | | NULL | | | desired_state | varchar(32) | YES | | NULL | | | dynamically_scalable | tinyint(1) unsigned | NO | | 0 | | | display_vm | tinyint(1) | NO | | 1 | | | power_state | varchar(74) | YES | | PowerUnknown | | | power_state_update_time | datetime | YES | | NULL | | | power_state_update_count | int(11) | YES | | 0 | | | power_host | bigint(20) unsigned | YES | MUL | NULL | | | user_id | bigint(20) unsigned | NO | | 1 | | +--------------------------+---------------------+------+-----+--------------+----------------+

@Hossein4185 ok

that's a bit strange, the disk_offering_id is only added to the table once during upgrade

engine/schema/src/main/resources/META-INF/db/schema-41310to41400.sql:
ALTER TABLE `cloud`.`vm_instance` ADD COLUMN `backup_offering_id` bigint unsigned DEFAULT NULL COMMENT 'ID of backup offering';
Hossein4185 commented 10 months ago

These may help to solve the problem. I plan to update to version 4.17.2 And I followed the steps below. I have 2ACS Ver 4.13.0 Server on CentOs 6 and mysql 5.7 on CentOs 6

1-update ACS Server to 14.13.1 because I want to add Two ACS Server on CentOs 7 2-change database to Mariadb 10.6 from mysql 5.7 we transfer with Mysqldump and The service was tested and worked properly CentOs 7 3-We Added two ACS Server CentOs 7 to existing Management Server. Because 4.17.2 not Supported CentOs 6 Server .The service was tested and worked properly CentOs 7 4-I wanted to update the service to 4.17.2, but I got an error.

All steps were tested in the lab and worked properly but I got an error in production.

DaanHoogland commented 10 months ago

@Hossein4185 , just a thought; did you start both management servers at the same time? You should wait for one to have done the upgrade, before starting the second one.

Hossein4185 commented 10 months ago

I followed the document below and stopped all management first and opdated one ACS and got an error https://docs.cloudstack.apache.org/en/4.17.2.0/upgrading/upgrade/upgrade-4.13.html

DaanHoogland commented 10 months ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

weizhouapache commented 10 months ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

could be

@Hossein4185 Could you dump the database without data of test environment and production, and check if there are differences ?

Hossein4185 commented 10 months ago

All steps were tested in the lab and worked properly but I got an error in production.

ok, so something else was different in your test env as opposed to production?

could be

@Hossein4185 Could you dump the database without data of test environment and production, and check if there are differences ?

@DaanHoogland @weizhouapache

I compared the two databases and there were differences, this problem started when I had two Centos6 management servers and Cloudstack 4.13.1 and added two Centos 7 servers with cloudstack 4.13.1 .But currently tables have been added to the database for 4.17.2. This is if I haven't had any failed attempts to update the service to 4.17.2 and now Centos 7 is working fine on 4.13.1

Hossein4185 commented 10 months ago

I deleted the exceptions using the following query and the 4.17.2 service was started. But I still don't know the reason for creating tables on version 4.13.1. I am testing the service...

drop table router_health_check ; drop table template_deploy_as_is_details ; drop table user_vm_deploy_as_is_details ; SET FOREIGN_KEY_CHECKS=0; drop table kubernetes_cluster ; SET FOREIGN_KEY_CHECKS=1; drop table kubernetes_cluster_vm_map; drop table kubernetes_supported_version; drop table resource_icon; drop table network_permissions; SET FOREIGN_KEY_CHECKS=0; drop table dc_ip6_guest_prefix; SET FOREIGN_KEY_CHECKS=1; drop table ip6_guest_prefix_subnet_network_map;

weizhouapache commented 10 months ago

@Hossein4185 your database might be broken after your failed attempt. When upgrade fails, you'd better restore database from backup. since your environment has been running for a period after that, it seems the only way to fix the problem is, changing the database tables manually to same as 4.13.1 database.

weizhouapache commented 10 months ago

@DaanHoogland @Hossein4185 I this this as not-a-bug and removed it from 4.18.1.0 milestone

rohityadavcloud commented 2 months ago

It may be possible that MariaDB isn't widely used, and some things are compatible with MySQL aren't compatible with MySQL. Is this still an issue for you @Hossein4185 ?