Open amanda-tarafa opened 3 years ago
Some aspects that we might want to discuss (that I can think off right now)
My thoughts on these:
Resource deletion: Keeping resources after tests are done allows using them for diagnosing purposes when tests fail. But it also means that we need to delete stale resources before starting a new test run, and that if we are not very good at that, we hit quotas. Initializing test runs also takes longer because of all the deletions. I'm happy to reconsider this approach and delete temporary resources on test teardown, or maybe on fixture teardown.
Fixed vs. temporary resources: Thinking particularly of backups, but also of some DB operations. They take (or at least used to) a long time (some are still being skipped, see #1300 ), so if we can avoid creating temporary resources in those cases, that's preferable. The aim should be to keep running times as low as possible.
Test execution ordering: I'm still opposed to executing tests in a certain order, not just because of testing best practices, but also because there's no guarantee that users will execute samples in a given order. We need to make sure that samples work on their own.
Splitting the current fixture: Seems like the instance is a good place to do that, as in, each fixture works with a different DB instance? But this is just intuition.
In addition to the above, I would also like to add the following to the list of discussion points:
While it can be useful to keep a resource around after a test run for diagnostic purposes, many of the sample tests are so simple that a failure will normally be easy to diagnose without the resource as well, especially if the single test is quick to run from a development machine. I would say that an approach that prefers deletion of resources after a run is preferable, unless the resource in question has some attributes that make it different from the most common resources, or if we know that it takes a long time to create. In the latter case, it might be more efficient to diagnose a test failure by looking at the resource that was left behind, than by rerunning the test.
Also, the deletion should normally be the responsibility of the code that creates is. So my suggestion would be something like:
Most types of resources are now quickly created by Cloud Spanner when the resource is small/empty, but backups can still be slow. Tests that need one or more backups should therefore prefer using fixed resources over creating temporary ones. Other resources can be created quickly. Some example times (the below tests use uniquely generated ids for the resources, as Cloud Spanner will otherwise throttle requests to create a new resource with the same name as a recently dropped resource):
Based on the above, I would suggest that test cases should prefer temporary resources for all resource types, except for backups.
I completely agree with the above, the tests should whenever possible be independent of execution order, and independent of all other tests.
Based on all the above, I would suggest that:
The above would make the tests completely independent of each other, which makes it easier to execute them individually (and also to execute them on the emulator).
The above execution times are (except for extreme circumstances) not affected very much if multiple operations are executed in parallel on different databases. That means that an op-in option for executing (simple) tests in parallel would further reduce the total execution times of the sample tests. Tests that are set up to always create/drop their own temporary database can safely be executed in parallel with other tests that use the same strategy. This parallel strategy also works when executing tests on the emulator, as long as the individual tests do not use features that are not supported on the emulator. Executing parallel operations and transactions in different databases on the emulator is supported.
+1 to chatting about Parallel execution and dropping Test exection order since we are in agreement there.
I'm fine with allowing parallel execution. If I remember correctly we had some resource overlapped, which should be taken care of once we've agreed on a new approach here, and also because we were hitting some quotas (requests per minute kind of quotas). We might need to fine tune it but that's fine. For reference: https://xunit.net/docs/running-tests-in-parallel
(@olavloite now I could add you on the assignee list as well, weird why I couldn't at first)
@amanda-tarafa Thanks for the elaborate points. I think we agree on most (all?) of the above. I'll take a detailed look Thursday or Friday this week. The first half of this week is a little bit hectic.
Great! No rush at all on my part. This is a nice to have (do) but tests are running pretty consistently so it's fine that we take time to do all this.
I totally agree with almost all of the above, so I'll just focus on a couple of worries that I have regarding a couple of points (plus one point where my rationale was indeed not really clear):
... I think that, when you see a class holding different tests, I just see a different fixture ...
Yeah, I'm a little bit too used to the Java test environment, where there is no separate fixture, and the setup is part of the test class itself. I think in most cases we mean the same, but my terminology is a little off. The biggest difference is your idea on a shared query fixture, which I really like. I have a couple of small worries/exceptions (see below).
We can have a fixture for queries, that initializes a database with all tables and data that will be expected by query tests. Query tests don't need to create or delete resources. The fixture will delete the database on teardown.
I think that this will work very well for many of the query samples, but probably not all. There are two (maybe three) categories that I worry about:
All the above can easily be worked around by using the 'data modification' fixture instead of the query fixture for the specific sample test that would include any of the above. So my suggestion would be:
Deletion always happens on fixture teardown, and not on test teardown (except if we need to start deleting resources faster because of quotas, but let's get there first).
I agree that we should try to stick with the strategy that the deletion happens on fixture teardown (i.e. on instance deletion) instead of after each test, as long as that is possible. We might however be closer to that point than we think. The current samples directory contains 79 sample files, and there is a hard limit of max 100 databases per instance on Cloud Spanner. Assuming that many of the samples can use the shared query fixture, we should be good for now, but it is something that we should keep in mind if the number of (non-query) samples continues to grow.
On point 3, I don't understand the rationale here. If the creation takes a long time, but we need to create it anyway, why would that makes us deffer deletion? Did you mean if deletion takes a long time? Then I'd agree that deleting the resource in the test should be avoided, but as per my point before, we would delete it on fixture teardown.
Sorry, I should have included my rationale in that point. The reasoning is that we generally will not keep resources around for inspection in case of test failure, as most resources are quick and easy to create anyways. That means that debugging a failed test is normally easy to do by just running it again. That is not necessarily true for tests that create for example backups, and for those cases it might be useful to keep the resource around after the test has executed, and only delete it at a later time. This is however not something that I feel very strongly about either way, so I'm also perfectly happy to try to keep the strategy as clean as possible, and use the same deletion strategy for all types of resources.
One additional point regarding the deletion of backups: Cloud Spanner does not allow deleting instances that contain one or more backups. The fixture teardown must therefore first delete all backups that are stored on an instance before it can delete the instance.
- Use the standard query fixture whenever possible.
- Use a fixture that can create a temporary database for the specific test.
Yes, totally agree. We'll always need to support exceptions for the general case.
I agree that we should try to stick with the strategy that the deletion happens on fixture teardown ... Assuming that many of the samples can use the shared query fixture, we should be good for now, but it is something that we should keep in mind if the number of (non-query) samples continues to grow.
Yes, I'm also worried that we might be almost there. I wouldn't like to have to rewrite things again in a few months. Since the limit is per instance, I was wondering if we could just group some other tests in a similar way as will do queries, even if it's not for reusing one database, but just for avoiding the limit. For instance, we can group all the admin tests in one fixture, etc. But I'm ultimately fine with deleting databas on a per test-basis if we need to, or if we thinkg we'll need to in the short/medium term. It'll make tests slightly slower probably, but hey...
The reasoning is that we generally will not keep resources around for inspection in case of test failure, as most resources are quick and easy to create anyways.
Ah yes, I see. I think I'm now leaning towards a cleaner test strategy here and just try and delete everything at teardown. But also, I think that at teardown, we should continue to attempt to delete stale resources, which will allow us to always delete on a best effort basis, and not fail the tests if cleanup failed. So, I think it'll be easier to switch from one strategy to the other if we need to, basically, we will start by doing DeleteOwnBackups(); DeleteStaleBackups();
and we can just remove this firs call if we need to.
The fixture teardown must therefore first delete all backups that are stored on an instance before it can delete the instance.
Yes, I think we do this right now.
The existing test fixture is getting unmanageable in both the amount of time it takes to initialize, and the amount of unrelated code it's holding. Also, some decisions that were made at the beginning os Spanner test rewritting might have not been the best, in hindsight, and may be reconsider now.
The general proposal is to have several specialized fixtures, maybe all deriving from a base class.
Let's use this issue to discuss some options before starting any work.
@olavloite FYI (I can't add you as an assignee)