Closed JonPSmith closed 1 year ago
Is 12 seconds for EnsureDeleted, then EnsureCreated a normal time, or is something I am doing wrong?
I gave it a try, and on my machine it's more like 7 seconds in total...
I also tried the drop all tables approach. The PostgreSQL commands run quickly from pgAdmin but it actually takes longer than the Respawn approach (something like 4 seconds) because EnsureCreated has to add the tables.
This is odd - actually creating tables should be very fast, the thing that takes up most of the time in EnsureCreated should be the database creation itself, which shouldn't be occurring in this case...
Stepping back, is the idea to create a different table schema on the same database (I'm not sure why that would be necessary), or simply to empty the tables from their data? If it's the latter, then rather than dropping tables (like with the DROP SCHEMA public CASCADE
approach), you could use the PG TRUNCATE command (which is an efficient form of DELETE FROM table
); note that it supports CASCADE as well.
If you really need to drop and recreate all tables, then the DROP SCHEMA public CASCADE
approach seems reasonable, but I can't imagine the whole thing taking so long - can you share a project/schema?
I guess ProstgreSQL is just a slow database system. :trollface: 😉
:fire: :firecracker: :fist_oncoming:
Yeah, databases in general are known to be more heavyweight in PG compared to e.g. SQL Server. Among other things, a connection is always made to a specific database, and switching the database on an existing connection simply isn't possible. PG definitely isn't made for creating/dropping lots databases quickly (but I'm not sure that's a good pattern even in other databases where it's faster...).
You are so easy to troll. 😹
I know, I know... I'm extremely trollable when it comes to PG :)
Hi @roji,
Thanks for looking at this, and sorry for opening you up to @ajcvickers remarks ☹️ !!
I gave it a try, and on my machine it's more like 7 seconds in total...
OK, that sounds that my 12 second, which might be because you are on a pure Linux system. I remember before EF Core 5 (well really the Microsoft.Data.SqlClient change) EnsureDeleted
+ EnsureCreated
took 10 seconds, and now its 1.5 seconds.
This is odd - actually creating tables should be very fast...
I enabled LogFor with timings and it looks like opening a connection to the server takes 1.5 seconds. To try this run the test TestPostgreSqlHelpers.TestEnsureCreatedAndEmptyPostgreSqlOk and see the output (details of the repo at the end).
Stepping back, is the idea to create a different table schema on the same database?
Ideally I would like a PostgreSQL version of your SQL Server EnsureClean method inside the test code of EF Core. @ajcvickers pointed this out and I have copied it into the EfCore.TestSupport library - its really fast (500 ms!) and it makes sure the database is up to date to the EF Core Model too.
However, Having a fast "empty all rows" approach is acceptable because the EfCore.TestSupport library has a DeleteAllPostgreSqlUnitTestDatabases
that deletes all the PostgreSql databases that start by the test common database name. So, when the EF Core Model changes the developer has to run this and the next run of the test will create a new database.
can you share a project/schema?
The open-source library is in the repo https://github.com/JonPSmith/EfCore.TestSupport and the code to look at:
EnsureCreatedAndEmptyPostgreSqlAsync
method in this class contains the "drop all tables" approach, but commented out because it was slower than respawn.If you want to run the tests, then you need to edit PostgreSqlConnection
connection string in the the appsetting.json
in the Test
project. The actual database name starts with the database name in that connection string but has the test class name added on the end so that each parallel unit test class has its own, unique database.
I enabled LogFor with timings and it looks like opening a connection to the server takes 1.5 seconds. To try this run the test TestPostgreSqlHelpers.TestEnsureCreatedAndEmptyPostgreSqlOk and see the output (details of the repo at the end).
I ran TestEnsureCreatedAndEmptyPostgreSqlOk - the first time it took about 6 seconds, the second time about 1.6. This seems to be because the test does EnsureCreated, but not EnsureDeleted, so almost nothing has to happen the second time around; this seems problematic, by the way - if the schema changes, this test will start failing since EnsureCreated doesn't make sure the schema is in sync with the model. Adding EnsureDeleted before EnsureCreated makes the test run consistently in about 7.5 seconds.
To summarize, any approach which drops/creates an entire database per test or test class is generally going to be slow; maybe especially so on PostgreSQL, but probably on other databases as well (compared to alternative approaches, in any case). I don't think there's any way around that really.
Ideally I would like a PostgreSQL version of your SQL Server EnsureClean method inside the test code of EF Core. @ajcvickers pointed this out and I have copied it into the EfCore.TestSupport library - its really fast (500 ms!) and it makes sure the database is up to date to the EF Core Model too.
That exists - see NpgsqlDatabaseCleaner, you can use that if you want. Generally all EF Core providers which implement the EF specification test suite should have an implementation for this; I've opened #26550 to consider bringing this functionality into the product and exposing it to users writing integration tests.
One note on the implementation of database create/drop logic... Rather than copying database-specific logic (as you've done e.g. in DeleteAllPostgreSqlUnitTestDatabases), you may want to consider setting up a simple DbContext with the appropriate connection string, and then simply using context.Database.EnsureDeleted/Created; this would leverage the provider's IDatabaseCreator which already knows how to do all this, rather than copying the logic out for each database.
DeleteAllPostgreSqlUnitTestDatabases
Just to nitpick :rofl: Should this be called "unit test" databases? Typically "unit test" implies no real database is involved (as opposed to integration test).
Hi @roji,
I ran TestEnsureCreatedAndEmptyPostgreSqlOk - the first time it took about 6 seconds, the second time about 1.6.
Yes, I assume that the EF Core Model doesn't change a lot so I was testing the "already have database" version. Other tests time the EnsureDeleted + EnsureCreated situation. I explain that if the EF Core Model you need to call the "DeleteaAll" method, but in SQL Server having to delete all the test databases isn't used so much because of SQL Server EnsureClean.
That exists - see NpgsqlDatabaseCleaner, you can use that if you want.
That would be perfect!! I will certainly look at that. For the SQL Server EnsureClean I had to edit the CreateDatabaseModelFactory
ctor, which I will need to do with NpgsqlDatabaseCleaner. I have added a comment on issue #26550.
One note on the implementation of database create/drop logic
Yes, using context.Database.EnsureDeleted/Created would be a better way. I'll have a go at that.
Just to nitpick 🤣 Should this be called "unit test" databases...
Good point. I will change that.
Thanks for looking at this and providing really good tips/code. Thanks.
My pleasure, and thanks for working on this and raising these thoughts - as I said I'm thinking about testing at the moment for the docs, so this is very relevant.
Hi @roji
OK, I have copied the NpgsqlDatabaseCleaner
code from the https://github.com/npgsql/efcore.pg repo (and added a heading with the efcore.pg license) and I have got it to run.
NpgsqlDatabaseCleaner
contains a code to detect ServerCompatibilityMode.Redshift
, but it uses a NET6 only feature (conn.Settings
). My library is multi-framework (netstandard2.1 = EF Core 5, net6.0 = EF Core 6). I have used an #if NET6_0_OR_GREATER
which assumes that the database doesn't have the ServerCompatibilityMode.Redshift
setting in EF Core 5. Is that sensible?NpgsqlDatabaseCleaner
uses an internal NpgsqlSqlGenerationHelper
. Shouldn't be a problem, but the SQL Server Cleaner didn't use any internal types.Here are the timings for the three approaches. NOTES:
Approach | 7 tables | 100 tables |
---|---|---|
EnsureDeleted + EnsureCreated | 12.8 sec | 13.4 sec |
EnsureClean (PostgreSQL) | 4.3 sec | 5.1 sec |
Respawn | 2.2 sec | 2.8 sec |
I will provide both approaches:
context.Database.EnsureClean();
context.EnsureCreatedAndEmptyPostgreSqlAsync();
Certainly EnsureClean is a better approach to new developers as it automatically handles EF Core Model changes, so I will list the EnsureClean approach first with a recommendation to use that. Personally I think I would go with the Respawn approach because its nearly 50% quicker than EnsureClean (PostgreSQL), but its async, which is a pain as exception stacktrace is harder to read - only using these approaches for real will tell me what my preference is.
PS. If you want to look at the code its in the dev
branch of the https://github.com/JonPSmith/EfCore.TestSupport . I don't put it into master
branch until its ready to be turned into a NuGet package.
@JonPSmith thanks for measuring (and all the general discussion)! I do have one additional thought... Would it be possible for you to also benchmark the DROP SCHEMA public CASCADE
approach and added it to the above table? My thinking is the following... NpgsqlDatabaseCleaner (as well as the SQL Server one) are currently implemented by reverse engineering the database, and then dropping each detected table/sequence one-by-one. This probably isn't the most efficient way to clear a database out - a small piece of procedural SQL which detects which schemas are defined and then drops them all would very likely be better. If that's so, then I'd also consider replacing the implementation of NpgsqlDatabaseCleaner to simply do that, instead of relying on reverse-engineering.
I suspect that this may get us to something very similar to Respawn's running time, if not even better.
And one more note...
At least when presenting the various options to users, I'd make a very clear distinction between approaches which recreate the schema (EnsureDeleted+EnsureCreated, EnsureClean), and those which only clear out data (Respawn, TRUNCATE
). I'd say the latter aren't suitable as a setup method at the very beginning of the test run - since schemas do change - but can make a lot of sense between tests within the same run. So at least IMHO it's not really a question of which is preferable/should be listed first - these are different things which solve different problems.
Personally I think I would go with the Respawn approach because its nearly 50% quicker than EnsureClean (PostgreSQL), but its async, which is a pain as exception stacktrace is harder to read
So aside from the fact using Respawn for first step would make tests fail if the schema changes and require manual intervention, I'm curious that async makes you want to stay away - with practically everything else being async nowadays :)
The NpgsqlDatabaseCleaner contains a code to detect ServerCompatibilityMode.Redshift, but it uses a NET6 only feature (conn.Settings). My library is multi-framework (netstandard2.1 = EF Core 5, net6.0 = EF Core 6). I have used an #if NET6_0_OR_GREATER which assumes that the database doesn't have the ServerCompatibilityMode.Redshift setting in EF Core 5. Is that sensible?
I don't think NpgsqlConnection.Settings should be NET6.0-only... This is an Npgsql (ADO.NET) feature, and Npgsql itself still supports netstandard2.0 - so you should be fine (note the distinction between the Npgsql EF Core provider which is .NET 6.0 only - like EF Core - and the underlying ADO.NET driver). Regardless, this specific thing is to make sure things work when executing against Amazon Redshift, which is a PostgreSQL-like database. If you need to remove it for some reason, then it should really be OK to do so.
The NpgsqlDatabaseCleaner uses an internal NpgsqlSqlGenerationHelper. Shouldn't be a problem, but the SQL Server Cleaner didn't use any internal types.
Yeah... It doesn't actually need to reference NpgsqlSqlGenerationHelper - it just needs the provider implementation for ISqlGenerationHelper (in order to delimit some stuff), so if you can simply resolve that from the context and pass it to the cleaner, that should work fine. The current Cleaner doesn't use proper DI practices because it's just test infrastructure.
Hi @roji,
Would it be possible for you to also benchmark the DROP SCHEMA public CASCADE approach.
I did try DROP SCHEMA public CASCADE
and its slower Respawn and about the same as your cleaner - about 4 seconds for small table. I prefer your cleaner as DROP SCHEMA public CASCADE
is about the same time and your cleaner drops other things like functions.
I'd make a very clear distinction between approaches which recreate the schema (EnsureDeleted+EnsureCreated, EnsureClean), and those which only clear out data (Respawn, TRUNCATE).
That has always an issue, especially before EF Core 5, where "delete all databases when EF Core changes" was the standard approach. But I am planning a big re-write of the docs as its not so clear since the EF Core 5 version.
I'm curious that async makes you want to stay away - with practically everything else being async nowadays :)
Good point. I try to use sync if I can because it can be a pain to working out what has gone wrong in exceptions from an async method.
I don't think NpgsqlConnection.Settings should be NET6.0-only
I definitely have an error in the netstandard2.1 version, with the Settings
part the error is shown below
Error CS1061
'NpgsqlConnection' does not contain a definition for 'Settings' and no accessible extension method 'Settings' accepting a first argument of type 'NpgsqlConnection' could be found (are you missing a using directive or an assembly reference?)
TestSupport (net6.0) - Available
TestSupport (netstandard2.1) - Not Available
I have spent an hour looking at different version of Npgsql etc. trying to fix this error, but can't find the problem. What I do know is you only added the if (conn.Settings.ServerCompatibilityMode == ServerCompatibilityMode.Redshift)...
in the 6.0.0 version so I'm going to comment out with #if NET6_0_OR_GREATER
.
Yeah... It doesn't actually need to reference NpgsqlSqlGenerationHelper - it just needs the provider implementation for ISqlGenerationHelper.
I'll stick with the internal code for now. If its still a problem in the later version I can fix it then.
I did try DROP SCHEMA public CASCADE and its slower Respawn and about the same as your cleaner - about 4 seconds for small table. I prefer your cleaner as DROP SCHEMA public CASCADE is about the same time and your cleaner drops other things like functions.
Wow, that's really surprising... I'll do a bit of playing around too to see what I can come up with.
Note that dropping the schema also drops functions, since functions are always created in a schema - the same is true for most other object types (even custom types). In fact, this makes schema dropping must easier and more reliable: with the cleaner, I have to implement specific reverse-engineering and functions, types, etc., where dropping the schema just drops everything.
I definitely have an error in the netstandard2.1 version, with the Settings part the error is shown below
I think I may know what's going on... NpgsqlConnection.Settings was only publicly exposed in Npgsql 6.0 (https://github.com/npgsql/npgsql/pull/3570), are you by any chance using older version of Npgsql when testing netstandard2.1? If you use Npgsql 6.0 everywhere (it supports netstandard2.0), you should be fine.
Wow, that's really surprising... I'll do a bit of playing around too to see what I can come up with.
I'm NOT an expert on PostgreSQL so I leave that to you! Let me know if you come up with something.
I think I may know what's going on... NpgsqlConnection.Settings.
Ah, the netstandard2.1 version is for EF Core 5 and uses Npgsql.EntityFrameworkCore.PostgreSQL Version="5.0.10", which uses Npgsql Version="5.0.10.
Yeah, then that's why... All good, it's perfectly fine to drop that Redshift part in any case. I'll do some benchmarking and post back.
Here's a bit of benchmarking - DROP SCHEMA public CASCADE; CREATE SCHEMA public
clearly wins on my side. This isn't a big surprise - this technique does a single roundtrip to the database, making PostgreSQL do all the work on its side. NpgsqlDatabaseCleaner, in contrast, does multiple roundtrips to find tables, sequences, functions and other database entities, and then drops each one of these in a separate DDL statement (albeit in a single roundtrip). The second is pretty much guaranteed to be slower.
Note that there's a slightly inaccuracy here, since we need to drop all schemas, and not just the public one. I did that via some procedure SQL in another benchmark method - and it works just as quickly as just dropping public.
So in summary... I think the .NET ecosystem needs a test infrastructure component which knows how to efficiently empty databases (and possibly create a blank one if it doesn't already exist), a bit like how respawn can reset data. This wouldn't be part of EF Core (since non-EF projects could also use it), nor would it use EF Core to do its job like RelationalDatabaseCleaner (since there are more efficient ways). In fact, once something like that exists, I'd propose removing RelationalDatabaseCleaner from the EF tests altogether and use this instead.
BenchmarkDotNet=v0.13.1, OS=ubuntu 21.10
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.100-rc.2.21505.57
[Host] : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
Job-VXYAUW : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
InvocationCount=1 UnrollFactor=1
Method | NumTables | Mean | Error | StdDev |
---|---|---|---|---|
DropPublicSchema | 10 | 10.42 ms | 1.104 ms | 3.256 ms |
DropAllSchemas | 10 | 11.67 ms | 1.185 ms | 3.474 ms |
DatabaseCleaner | 10 | 222.57 ms | 4.235 ms | 4.532 ms |
DropPublicSchema | 100 | 63.37 ms | 2.646 ms | 7.507 ms |
DropAllSchemas | 100 | 66.10 ms | 2.516 ms | 7.179 ms |
DatabaseCleaner | 100 | 360.20 ms | 10.115 ms | 29.507 ms |
Looks great! Overall are you saying that you want a database provider to create the best/fast EnsureClean
method. I think that would be great (see my setUpSchema
suggestion for an additional feature).
What do you think: should I use your new code or stick with your existing NpgsqlDatabaseCleaner
code?
PS. One thing the the stack overflow answer said was:
If you are using PostgreSQL 9.3 or greater, you may also need to restore the default grants.
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
Overall are you saying that you want a database provider to create the best/fast EnsureClean method.
Yeah, although this would likely be some community package which knows different databases and how to clean them up. I don't think this belongs in an EF provider, or in an ADO.NET provider, but there's no reason this can't be some external package, similar to how respawn already handles data resetting across databases.
What do you think: should I use your new code or stick with your existing NpgsqlDatabaseCleaner code?
I think my procedural SQL fragment should be preferred (simply since it's faster) - though you should be able to see that in your own perf measurements... If not, it's probably worth understanding what's going on...
If you are using PostgreSQL 9.3 or greater, you may also need to restore the default grants.
Good point, thanks!
I membered that the DROP SCHEMA commands were executed quickly, and it was the call to EnsureCreated
that was slow. In your code you have a CreateTables()
but I don't see the code. I see an creator.CreateTables();
in the RelationalDatabaseCleaner
- is that what you are using??
The CreateTables method is there in the benchmark source code - it's just some manually-prepared batch to create some tables and populate them.
Doesn't EF Core's EnsureCreated have to be called either way in order to create the schema? In other words, the RelationalDatabaseCleaner eventually has to do the exact same thing, i.e. call CreateTables... EF Core's EnsureCreated checks if the database exists (would be true in our case), then checks whether any tables exist (would be false), and then calls into the same CreateTables that RelationalDatabaseCleaner calls into.
My testing said that EnsureCreated took up most the time in the drop scheme approach. You might need to check your code with EnsureCreated (or the creator.CreateTables();
from the RelationalDatabaseCleaner
@JonPSmith I'll take another look.
RelationalDatabaseCleaner internally does (almost) the same thing as EnsureCreated to recreate the schema. So we should effectively be comparing the two ways to drop all schemas. But maybe I'm missing something.
Hi @roji,
Thanks for you help. As you will seen I have released a version EfCore.TestSupport which supports EF Core 5 and EF Core 6 now that NET6 is out (with PostgreSQL methods including EnsureCreated
). I expect you are busy with that update too.
I have quite a few libraries so I need to focus on NET 6 release first, but I do still want to see if the DROP SCHEMA approach will improve this. I expect you are really busy so I'm happy to try your DROP SCHEMA code and add a call the EnsureCreated
. I suspect it will be slow because that's what I found with with my simple DROP SCHEMA but its worth a check.
PS. I included the Respawn approach in the new release, but the documents don't include the Respawn approach. If the DROP SCHEMA is faster I'll remove the Respawn approach, otherwise I will describe it with the out-of-date schema problem.
Hi @roji,
I added a simple benchmark test and added context.Database.EnsureCreated();
after the DROP SCHEMA
and I the DROP SCHEMA approach wins, but with as big as your benchmark. Here is the results for 7 tables
Method | Mean | Error | StdDev |
---|---|---|---|
DropPublicSchemaWithEnsureCreated | 2.125 s | 0.0188 s | 0.0175 s |
DropAllSchemasWithEnsureCreated | 2.132 s | 0.0177 s | 0.0165 s |
EnsureClean | 4.232 s | 0.0238 s | 0.0223 s |
EnsureCreatedAndWipedByRespawn | 2.127 s | 0.0121 s | 0.0113 s |
The DROP SCHEMA
approach takes the same time as the Respawn approach, but of course the DROP SCHEMA
also makes sure the schema is correct for the current EF Core Model. I therefore plan to change my PostgreSQL EnsureClean
method to to use the DropAllSchemasWithEnsureCreated
code.
Can I ask you to decided if your current DROP All SCHEMA code should include the extra SQL commands suggested in the stack overflow answer. If you think this is needed could you send me the updated SQL code of the DROP All SCHEMA with commands below added.
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
Thanks in advance.
@JonPSmith thanks for continuing to look into this - I'm still stuck on other things but was intending to look again myself.
Re the extra commands to grant permissions, I'd say these aren't really necessary in our specific context of testing. The schema creation (which happens after the drop) grants access to the creating user, which is very likely also the user on which the tests are running. So unless some very convoluted things are happening in the test suite, including some specific games around different users and permissions, these wouldn't be necessary.
But if you're looking for a fully complete fragment that brings your database back to a clean slate that is as close as possible to a newly-created database, then this can be included:
DO $$
DECLARE
r RECORD;
BEGIN
FOR r IN (SELECT nspname FROM pg_namespace WHERE nspname NOT IN ('pg_toast', 'pg_catalog', 'information_schema'))
LOOP
EXECUTE 'DROP SCHEMA ' || quote_ident(r.nspname) || ' CASCADE';
END LOOP;
EXECUTE 'CREATE SCHEMA public; GRANT ALL ON SCHEMA public TO postgres; GRANT ALL ON SCHEMA public TO public';
END $$
I'd be surprised if these two statements have any noticeable impact on perf - you can check that out if you want. Assuming that's the case, you may as well just include them...
Wow, talk about going down a rabbit hole. Performance tuning is hard work!
OK, in summary we both forget about the test to see if the database exists (but updating my tests caught that). I turns out that testing if the database exists costs a LOT of time - the tests below suggests its about 2 second on my system. My conclusions are:
Method | Mean | Error | StdDev | Median |
---|---|---|---|---|
EnsureCleanUsingDropSchema | 4,219.43 ms | 14.794 ms | 13.838 ms | 4,220.12 ms |
WipedByRespawnNoCheckDbExists | 58.34 ms | 1.594 ms | 4.623 ms | 59.59 ms |
WipedByRespawnWithCheckForDbExists | 2,128.80 ms | 17.257 ms | 16.143 ms | 2,128.82 ms |
EnsureDeletedEnsureCreated | 12,685.31 ms | 51.618 ms | 48.283 ms | 12,692.66 ms |
If you have any great ideas for improving the database exists
code then let me know. NOTE: I did try adding an IF EXISTS ...
to the DROP SCHEMA SQL, but the connection fails because there is no database.
My next steps:
RelationalDatabaseCleaner
that it doesn't delete the default migration history, which making testing migrations a bit more difficult. PS. The PostgreSQL connections performance feels like the old Microsoft.Data.SqlClient
library which used to be slow, but was updated when EF Core 5 came out.
EnsureDeleted
then EnsureCreates
took ~10 seconds. EnsureDeleted
then EnsureCreates
took ~1.5 seconds. Could the Npgsql connection management be improved??
Good point about the database existence check - but are you saying that simply adding it adds like 2 seconds (on top of the schema drop)? I did a quick benchmark on my machine, and I'm seeing NpgsqlDatabaseCreator.ExistsAsync complete in only 12ms - when the database already exists. I'm assuming we're interested in that scenario, since after the very first time tests are executed, the database will always exist and gets reused.
So it would be good to understand exactly where why it's taking so long on your side... Are you connecting to a remote (possibly far-away) PostgreSQL? Are you using some other method and not NpgsqlDatabaseCreator.ExistsAsync?
I've opened https://github.com/npgsql/efcore.pg/issues/2103 to track looking at the perf EnsureDeleted/EnsureCreated - I agree it seems long and warrants investigation. Though I still believe tests could be using a different, more efficient mechanism for getting to clean state (like the above) - it really doesn't make sense to me to be constantly dropping and creating databases when running tests.
Full benchmark results:
BenchmarkDotNet=v0.13.1, OS=ubuntu 21.10 Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores .NET SDK=6.0.100 [Host] : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
Method | Mean | Error | StdDev |
---|---|---|---|
Foo | 12.01 ms | 0.240 ms | 0.351 ms |
I called context.Database.EnsureCreated()
on an existing database and it took 2 seconds. My PostgreSQL server 12.8 is running in a WSL2 Windows Subsystem for Linux. If you know a better way to run PostgreSQL server then let me know!
The interesting thing is accessing an 'normal' database seems to be quick (as shown by the Respawn times), but accessing the postgre database takes the time.
I didn't check that the database exists, but caught the exception. This reduces the time from 6 second to 4 seconds. The code below checks if there is a database and it takes 2 seconds!! If I run the same SQL in pgAdmin it takes 50 ms. It's got to be the time to open the connection to the postgre
database, but why is that??
private static bool DatabaseExists(this string connectionString)
{
var builder = new NpgsqlConnectionStringBuilder(connectionString);
var orgDbStartsWith = builder.Database;
builder.Database = "postgres";
var newConnectionString = builder.ToString();
using var conn = new NpgsqlConnection(newConnectionString);
conn.Open();
using var cmd = new NpgsqlCommand($"SELECT COUNT(*) FROM pg_catalog.pg_database WHERE datname='{orgDbStartsWith}'", conn);
return (long)cmd.ExecuteScalar() == 1;
}
Maybe there's some sort of auth config issue with your PostgreSQL, I don't know - but just connecting to the postgres
database definitely shouldn't take 2 seconds. Maybe it's a good idea to test this on plain Windows, just to eliminate the possibility that WSL2 is somehow responsible (would be weird, but...), or if something is somehow wrong with your PG installation inside WSL2... Installing PG on Windows is super easy.
Found it!!!!
I did some googling about slow access and one answer was host should be 127.0.0.1
, not localhost
and that fixed it. It is now super-quick!!
I ran my benchmark, but the EnsureCleanUsingDropSchema
part had the error Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already
, but looking at what it did do it was 90 ms.
Method | Mean | Error | StdDev |
---|---|---|---|
EnsureCleanUsingDropSchema | ~90 ms | NA | NA |
WipedByRespawnNoCheckDbExists | 41.97 ms | 1.045 ms | 3.081 ms |
WipedByRespawnWithCheckForDbExists | 67.34 ms | 2.912 ms | 8.586 ms |
EnsureDeletedEnsureCreated | 340.45 ms | 16.030 ms | 47.266 ms |
That is fantastic! I have to update my article now.
Thanks for putting up with my problem, but the end result is mega.
@ajcvickers, see PostgreSQL isn't slow, just I just got it wrong.
see PostgreSQL isn't slow, just I just got it wrong.
I ran my benchmark, but the EnsureCleanUsingDropSchema part had the error Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already, but looking at what it did do it was 90 ms.
That happens if you open too many connection concurrently (or forget to close somewhere), bringing you above the Max Pool Size limit, which is 100 by default.
After all this discussion, can you please summarize what EnsureCleanUsingDropSchema does (there were some back and forths :sweat_smile:)? Are you calling the provider's Exists method, and then DROP SCHEMA pubilc
if it returns true (and EnsureCreated if false)?
Hi @roji,
I have finished this and here is a summary:
EnsureClean
method to add a PostgreSQL version.EnsureClean
uses the "DROP SCHEMA" approach. It uses SQL provided by @roji to handle multiple schemas. The code for this can be found in the the class PostgreSqlDropSchemaEnsureClean.NpgsqlDatabaseCleaner
code suggested by @roji because the "DROP SCHEMA" approach is quick and has the positive feature that it deleted all the migration history tables, while the EF Core's internal RelationalDatabaseCleaner
doesn't delete the default named migration history table. There are a small set of tests where this is useful:
Migrate
feature when creating a database that contains two or more DbContexts.EnsureClean
is so quick.PS. Can you tell me how to tell Benchmark not to run more than 100 clients so that I don't get the 53300: sorry, too many clients already
error. I looked at the Benchmark docs but I couldn't work it out.
PS. Can you tell me how to tell Benchmark not to run more than 100 clients so that I don't get the 53300: sorry, too many clients already error. I looked at the Benchmark docs but I couldn't work it out.
Is your benchmark using BenchmarkDotNet? If so, BDN is single-threaded, and should not ever run the the benchmark method more than once concurrently; you may have a connection leak or something - if you point me at the benchmark source I can try to help.
Other than that, thanks for the details - everything looks great, and it's good to have this as a nuget package people can integrate in their tests! As discussed offline, I think it would be good to have something like this without any sort of EF Core dependency: a simple, cross-database package that can be used to bring a database to a clean slate state, regardless of whether the database exists or not, or whether it already has tables. EF Core users could invoke this, and then just do EF Core's EnsureCreate, which would create the schema. I'll look into this and will keep you posted.
@JonPSmith : Maybe update this note?
/// NOTE: This only works for SQL Server
@roji , found the problem - one of the NpgsqlConnection
didn't have a using var
. Hard to catch that type of bug.
Benchmark results are:
Method | Mean | Error | StdDev |
---|---|---|---|
EnsureCleanUsingDropSchema | 68.73 ms | 3.467 ms | 10.168 ms |
EnsureDeletedEnsureCreated | 316.62 ms | 10.348 ms | 30.022 ms |
As I said, the EnsureClean is so fast that its not worth using the Respawn approach.
@ErikEJ, thanks for point that out. I have updated that.
New NuGet release 5.2.1 now out.
Thanks @JonPSmith!
Hi @roji,
I'm looking at updating my libraries to .NET 8 and I am starting with my EfCore.TestSupport as I use this to check my other libraries.
I have come to an error in the EnsureClean feature. The Npgsql.EntityFrameworkCore.PostgreSQL" Version="8.0.0-rc.1" doesn't have the CheckPoint
feature that is in the older versions. Here is the code that doesn't compile
public async static Task EnsureCreatedAndEmptyPostgreSqlAsync<T>(this T context, bool thereIsAnExistingDatabase = false
where T : DbContext
{
Checkpoint EmptyCheckpoint = new Checkpoint
{
DbAdapter = DbAdapter.Postgres
};
if (thereIsAnExistingDatabase || !context.Database.EnsureCreated())
{
//the database already exists, so we just need to empty the tables
var connectionString = context.Database.GetConnectionString();
using var conn = new NpgsqlConnection(connectionString);
await conn.OpenAsync();
await EmptyCheckpoint.Reset(conn);
};
}
Looks like this has changed. Can you point me to the code to fix this.
@JonPSmith this Checkpoint class isn't familiar to me, are you sure it used to be part of Npgsql.EntityFrameworkCore.PostgreSQL? Can you point me to it in the older version or similar?
I have added PostgreSQL helpers to my EfCore.TestSupport. These create unique database names for each xUnit test class, options with logging etc.
I have found that creating an empty database using
EnsureDeleted
, thenEnsureCreated
takes ~12 seconds on my setup (EnsureDeleted
= 4 seconds,EnsureCreated
= 8 second), which I think is a bit slow (I want fast tests!).I have therefore created a method that uses Jimmy Bogart's
Respawn
library, and that takes 2.2 seconds if there is a database already there.I also tried the drop all tables approach. The PostgreSQL commands run quickly from pgAdmin but it actually takes longer than the Respawn approach (something like 4 seconds) because
EnsureCreated
has to add the tables.My questions are:
EnsureDeleted
, thenEnsureCreated
a normal time, or is something I am doing wrong?NOTE: you can see the code for the fast / respawn approach here.
Include provider and version information
NOTE: My PostgreSQL 12 server is running in a Windows Subsystem for Linux (WSL2). Its a fast PC.
EF Core version: 6.0.0-rc.2 Database provider: Npgsql.EntityFrameworkCore.PostgreSQL" Version="6.0.0-rc.2" Target framework: (e.g. .NET 6.0-rc.2) Operating system: Windows IDE: (e.g. Visual Studio 2022)