HicServices / RDMP

Research Data Management Platform (RDMP) is an open source application for the loading,linking,anonymisation and extraction of datasets stored in relational databases.
https://github.com/HicServices/RDMP#research-data-management-platform
GNU General Public License v3.0
36 stars 16 forks source link

[v8.2.0] can't create logging server on "rdmp_logging" postgresql database #1873

Closed rkm closed 2 months ago

rkm commented 2 months ago

Describe the bug

This is a follow-on to https://github.com/HicServices/RDMP/issues/1831, which I've re-tested on RDMP v8.2.0.

The logging server creation fails, since the specified database is not specified for some SQL commands.

To Reproduce Steps to reproduce the behavior:

  1. Download RDMP CLI v8.2.0
  2. Start a postgres docker container and create a user and database
    postgres_id=$(docker run --rm -d -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres:15)
    docker exec -it $postgres_id psql -U postgres -c "CREATE USER rdmp password 'myPassword1.';"
    docker exec -it $postgres_id psql -U postgres -c "CREATE database rdmp_logging owner rdmp;"
  3. Create the logging server
    rm -rf yaml
    ./rdmp --dir yaml CreateNewExternalDatabaseServer LiveLoggingServer_ID "DatabaseType:PostgreSql:Server=localhost;Uid=rdmp;Pwd=myPassword1.;Database=rdmp_logging"

Expected behavior

The logging server is successfully created.

Screenshots

N/A

RDMP Version

v8.2.0

Error with Stack Trace

```console 2024-07-09 12:53:59.4223 INFO Dotnet Version:8.0.6 . 2024-07-09 12:53:59.4555 INFO RDMP Version:8.2.0.0 . 2024-07-09 12:54:00.9247 TRACE Running Command 'ExecuteCommandCreateNewExternalDatabaseServer' . Success:About to run: /*--Version:1.0.0*/ /*--Description:Initial Create*/ CREATE TABLE "rdmp_logging".public."DataSet"( "dataSetID" varchar(150) NOT NULL , "name" varchar(2000) NULL , "description" text NULL , "time_period" varchar(64) NULL , "SLA_required" varchar(3) NULL , "supplier_name" varchar(32) NULL , ... /*create tasks*/ INSERT INTO "DataLoadTask" ("ID", "description", "name", "userAccount", "statusID", "isTest", "dataSetID") VALUES(1, 'Internal', 'Internal', 'Thomas', 1, FALSE, 'Internal'); INSERT INTO "DataLoadTask" ("ID", "description", "name", "userAccount", "statusID", "isTest", "dataSetID") VALUES(2, 'DataExtraction', 'DataExtraction', 'Thomas', 1, FALSE, 'DataExtraction'); Fail:Create failed 2024-07-09 12:54:01.0860 ERROR Failed check with message: Create failed . 2024-07-09 12:54:01.0860 INFO Fatal error occurred so returning -1 . System.Exception: Failed check with message: Create failed ---> Npgsql.PostgresException (0x80004005): 3D000: database "rdmp" does not exist at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at Npgsql.Internal.NpgsqlConnector.g__OpenCore|213_1(NpgsqlConnector conn, SslMode sslMode, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken, Boolean isFirstAttempt) at Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken) at Npgsql.PoolingDataSource.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken) at Npgsql.PoolingDataSource.g__RentAsync|34_0(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlConnection.g__OpenAsync|42_0(Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlConnection.Open() at FAnsi.Implementations.PostgreSql.PostgreSqlServerHelper.ListDatabases(DbConnectionStringBuilder builder)+MoveNext() at System.Linq.Enumerable.SelectEnumerableIterator`2.ToArray() at FAnsi.Discovery.DiscoveredDatabase.Exists(IManagedTransaction transaction) at Rdmp.Core.MapsDirectlyToDatabaseTable.Versioning.MasterDatabaseScriptExecutor.CreateDatabase(Patch initialCreationPatch, ICheckNotifier notifier) in D:\a\RDMP\RDMP\Rdmp.Core\MapsDirectlyToDatabaseTable\Versioning\MasterDatabaseScriptExecutor.cs:line 53 Exception data: Severity: FATAL SqlState: 3D000 MessageText: database "rdmp" does not exist File: postinit.c Line: 945 Routine: InitPostgres --- End of inner exception stack trace --- at Rdmp.Core.ReusableLibraryCode.Checks.AcceptAllCheckNotifier.OnCheckPerformed(CheckEventArgs args) in D:\a\RDMP\RDMP\Rdmp.Core\ReusableLibraryCode\Checks\AcceptAllCheckNotifier.cs:line 35 at Rdmp.Core.MapsDirectlyToDatabaseTable.Versioning.MasterDatabaseScriptExecutor.CreateDatabase(Patch initialCreationPatch, ICheckNotifier notifier) in D:\a\RDMP\RDMP\Rdmp.Core\MapsDirectlyToDatabaseTable\Versioning\MasterDatabaseScriptExecutor.cs:line 133 at Rdmp.Core.MapsDirectlyToDatabaseTable.Versioning.MasterDatabaseScriptExecutor.CreateAndPatchDatabase(IPatcher patcher, ICheckNotifier notifier) in D:\a\RDMP\RDMP\Rdmp.Core\MapsDirectlyToDatabaseTable\Versioning\MasterDatabaseScriptExecutor.cs:line 421 at Rdmp.Core.CommandExecution.BasicActivateItems.CreateNewPlatformDatabase(ICatalogueRepository catalogueRepository, PermissableDefaults defaultToSet, IPatcher patcher, DiscoveredDatabase db) in D:\a\RDMP\RDMP\Rdmp.Core\CommandExecution\BasicActivateItems.cs:line 730 at Rdmp.Core.CommandExecution.AtomicCommands.ExecuteCommandCreateNewExternalDatabaseServer.Execute() in D:\a\RDMP\RDMP\Rdmp.Core\CommandExecution\AtomicCommands\ExecuteCommandCreateNewExternalDatabaseServer.cs:line 97 at Rdmp.Core.CommandExecution.CommandInvoker.ExecuteCommand(ConstructorInfo constructorInfo, CommandLineObjectPicker picker) in D:\a\RDMP\RDMP\Rdmp.Core\CommandExecution\CommandInvoker.cs:line 298 at Rdmp.Core.CommandLine.Runners.ExecuteCommandRunner.RunCommand(String command) in D:\a\RDMP\RDMP\Rdmp.Core\CommandLine\Runners\ExecuteCommandRunner.cs:line 104 at Rdmp.Core.CommandLine.Runners.ExecuteCommandRunner.Run(IRDMPPlatformRepositoryServiceLocator repositoryLocator, IDataLoadEventListener listener, ICheckNotifier checkNotifier, GracefulCancellationToken token) in D:\a\RDMP\RDMP\Rdmp.Core\CommandLine\Runners\ExecuteCommandRunner.cs:line 94 at Rdmp.Core.CommandLine.RdmpCommandLineBootStrapper.Run(RDMPCommandLineOptions opts, IRunner explicitRunner, IRDMPPlatformRepositoryServiceLocator existingLocator) in D:\a\RDMP\RDMP\Rdmp.Core\CommandLine\RdmpCommandLineBootStrapper.cs:line 145 at Rdmp.Core.Program.HandleArguments(String[] args, Logger logger) in D:\a\RDMP\RDMP\Tools\rdmp\Program.cs:line 83 ```

Database Engine

Postgres.

Additional context

Note that this works fine if the database is set as rdmp, as this matches the username. I'll use this for the moment.

JFriel commented 2 months ago

Suspect this may be a FansiSQL issue, as testing with MicrosoftSQLServer does not cause the issue seen. Will investigate further with FansiSQL

jas88 commented 2 months ago

In fact it's a side effect of fixing https://github.com/HicServices/FAnsiSql/issues/273 - you cannot connect to Postgres without specifying a database to use. Previously, we used postgres, which exists by default but not in @rkm 's tests, so switched that to use null - which makes it use the current username instead, which also works most of the time but not always. Reverting to postgres seems the most robust; the alternatives are template0 or template1, or trying multiple possibilities until one works.

rkm commented 2 months ago

I have specified a database to use in that example though. It's at the end of the command so you might have to scroll to see it!

JFriel commented 2 months ago

It's an under-the-hood issue with FansiSQL, there is a proposed fix for FansiSQL - https://github.com/HicServices/FAnsiSql/pull/287

Will catch up with @jas88 to get this sorted

jas88 commented 2 months ago

I have specified a database to use in that example though. It's at the end of the command so you might have to scroll to see it!

You have one in the connection string, but FAnsiSql has always overridden that specifically for the ListDatabases call, on the basis that if you're trying to list databases you might not have one in place. Originally it used the default postgres DB, I changed that to null (which turns out to map to the username) after FAnsiSql issue 273, so now we're reverting.

Alternatively perhaps the best answer is just not to override it at all and use whatever was specified in the connection string as-is?

JFriel commented 2 months ago

Leaving the connection string as-is won't work unfortunately, as trying to connect to a db that does not exist will leave us back at this issue. Wondering if the best option is to have RDMP handle this error and know that the DB we're attempting to check exists does in-fact not exist

jas88 commented 2 months ago

If the database named doesn't exist we can't connect in the first place - there's a chicken and egg problem there, we can't create a new database without having the name of an existing one. If the connection string names a database that doesn't (yet) exist, it won't work anyway and we can't create it either.

tznind commented 2 months ago

Maybe fansi could figure out the default database itself. According to ChatGPT postgres is default but should also be one called 'template1' and/or 'template0'

https://chatgpt.com/share/11a34e31-34f7-4f0e-8aa5-6877eac899df

On Wed, 10 Jul 2024, 14:35 James A Sutherland, @.***> wrote:

If the database named doesn't exist we can't connect in the first place - there's a chicken and egg problem there, we can't create a new database without having the name of an existing one.

— Reply to this email directly, view it on GitHub https://github.com/HicServices/RDMP/issues/1873#issuecomment-2220527419, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHO3C5DP6W4MYVOUFYMOJ5DZLU2ANAVCNFSM6AAAAABKSW5N7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQGUZDONBRHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jas88 commented 2 months ago

@tznind Those 3 exist by default, but the only one we can actually rely on existing is the one named in the connection string (since if it doesn't exist, the connection string won't work anyway).

tznind commented 2 months ago

Ah OK so use case is that the user had access only to 1 db, which is the one you are trying to create logging db schema into with RDMP cli

Guess the easiest 2 approaches are

Although i think that 2 is probably what you were saying already

JFriel commented 2 months ago

Could we add an optional "masterDB" parameter to the command to be used in cases like this where an existing database must be specified to be able to create a new database? We can then pass this "masterDB" down to FansiSQL to use as the default connector

jas88 commented 2 months ago

Except in the trivial isolated/CI case there's probably more setup than just creating a database anyway, we will want permissions assigned etc - better to keep it simple and use an existing database from the connection string IMO (as Ruaridh is doing in this case): if we created the database, we wouldn't know what user account to assign or what rights.

In this case the code in RDMP already checks if the database exists or not - it's actually that check that's failing - if that check returns true, we check with the user then use that existing database, so the rest is fine.

JFriel commented 2 months ago

@rkm We've got a fix for this that will be released in v8.2.1 Slight quirk with postgres: the database you're trying to create will need to exists (just as an empty database) before running the createLoggingDatabase command. But other than that, it should populate as expected.

The error message is now correct stating "<db you're trying to create> does not exist" rather than " by does not exist"

rkm commented 2 months ago

That's fantastic, thank you! The empty database will always exist beforehand, so this is no problem.