Closed matriv closed 4 months ago
for version_def in versions[1:]:
timestamp = datetime.utcnow().isoformat(timespec='seconds')
print(f"{timestamp} Upgrade to:: {version_def.version}")
self.assert_data_persistence(version_def, nodes, digest, paths)
# restart with latest version
version_def = versions[-1]
self.assert_data_persistence(version_def, nodes, digest, paths)
I've just realised, this is growing as O(n^2)
isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout onto assert_data_persistance
instead? Otherwise we're going to keep hitting this...
for version_def in versions[1:]: timestamp = datetime.utcnow().isoformat(timespec='seconds') print(f"{timestamp} Upgrade to:: {version_def.version}") self.assert_data_persistence(version_def, nodes, digest, paths) # restart with latest version version_def = versions[-1] self.assert_data_persistence(version_def, nodes, digest, paths)
I've just realised, this is growing as
O(n^2)
isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout ontoassert_data_persistance
instead? Otherwise we're going to keep hitting this...
The timeout is on each upgrade path, e.g. from 4.0.x
to latest
, not for the outer loop for all upgrade paths. If we move it to assert_data_persistance
then we exclude all the time spent during restarting the cluster, so we won't easily catch timeouts during these operations, which may hide issues. What do you think?
retest this please
From recently added logs it seems that the longest upgrade path, (starting with
4.0.x
), frequently consumes a bit more than 10 mins.example failure: https://jenkins.crate.io/blue/organizations/jenkins/CrateDB%2Fqa%2Fcrate_qa/detail/crate_qa/793/pipeline