dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Support for a global database teardown #2148

Closed evansde77 closed 12 years ago

evansde77 commented 13 years ago

Both DBS and Tier 0 will need an Oracle supporting version of the WMQuality/TestInit.py module to set up and tear down database instances for unitests.

CC'ing Lassi, since he may already have something along these lines for SiteDB.

stuartw commented 12 years ago

swakef: Doesn't seem fixed... http://dmwm.cern.ch:8080/job/WMCore-py2.6-mysql/305/

hufnagel commented 12 years ago

hufnagel: What do I have to do to actually see an error message instead of just 404 page ?

stuartw commented 12 years ago

swakef: If you go to dmwm.cern.ch:8080 you will get a signup link, create an account with the username hufnagel

ghost commented 12 years ago

lat: Didn't help, still get 404 page.

stuartw commented 12 years ago

swakef: Replying to [comment:54 lat]:

Didn't help, still get 404 page.

Try now, an admin needs to give each account permissions.

DMWMBot commented 12 years ago

mnorman: I can't actually tell what the state is. Someone keeps aborting the attempted builds, so we haven't run anything since putting in the last fix.

stuartw commented 12 years ago

swakef: Running the test manually should reproduce it i.e. python setup.py test --buldBotMode=true.

currently we hit some db retry logic which makes the tests take 4 hours. So the current one will be running for another 3 hours.

hufnagel commented 12 years ago

hufnagel: Matt, what did you fix exactly ? I looked at the logs (some of them at least) from the last build and couldn't identify what would be wrong in the patches for this ticket.

For example:

test/python/WMComponent_t/AlertGenerator_t/AlertGenerator_t.py def setUp(self): self.testInit = TestInit(file) self.testInit.setLogging(logLevel = logging.DEBUG) self.testInit.clearDatabase() self.testInit.setDatabaseConnection()

This code has two problems:

1) How can a call to clearDatabase before we setup the database connection ever work ?! 2) We explicitly said in this thread that by policy unit tests should never delete the database in setUp. The database needs to be empty or the unit test should fail.

So I really see nothing here to change. The unit test should be changed and the clearDatabase should be removed from setUp.

stuartw commented 12 years ago

swakef: Replying to [comment:58 hufnagel]:

test/python/WMComponent_t/AlertGenerator_t/AlertGenerator_t.py def setUp(self): self.testInit = TestInit(file) self.testInit.setLogging(logLevel = logging.DEBUG) self.testInit.clearDatabase() self.testInit.setDatabaseConnection()

This code was changed in 90f756f54c2e20be8fccd46fd37d02654761a750. Which is what the current build is using.

DMWMBot commented 12 years ago

mnorman: One possible problem here is weirdness in the Transaction_t unittest, which sets up and destroys DB elements with archaic code.

Filed as #2617

hufnagel commented 12 years ago

hufnagel: Really strange problem. As Matt said, the problem started in the Transaction_t unittest, which left a transaction open, then called clearDatabase and then committed the transaction.

The operations in that transaction were operating on tables specific to the unittest, so the commit failed of course (because we cleared the db). Somehow the operations stayed in the MySQL buffer though, because we observed error messaged related to them in later unittests (which did not use the same tables).

Then a few unitests later we started getting messages about no default database being present anymore, so the assumption is that MySQL finally gave up with the operations it could not run and in addition also unset the default database. Could even be a MySQL bug.

Fixing #2167 should fix the jenkins build again. As a precaution Matt will also add a check in WmInit.clearDatabase if there are open transactions and commit them before we wipe the db.

Only remaining question is what to do with the MySQL Destroy plugin. IMO we should revert to the previous version without the "no default database => create on" hack. The current hack would not have prevented the original problem anyways, just recovered things once everything fell apart. I think dropping the default database is a serious enough problem that we do not need to put in recovery code for that, we just need to fix the problem(s) that cause it to happen.

DMWMBot commented 12 years ago

mnorman: Transaction commit filed as #2618

DMWMBot commented 12 years ago

mnorman: I think things might be alive and well for now, and am suggesting we close this ticket unless someone complains.