Open michelescarlato opened 2 months ago
The core schema is inside MDR.
It contains 28 tables, divided into 11 objects_*
, 13 study_*
, 1 data_objects
, 1 studies
, and new_search_studies
and new_search_studies_json
.
The problem is that the new_search_objects
doesn't exist.
Analyzing the code:
The CoreStudyTableBuilder.cs create 12 study_*
and 1 studies
:
Study_object_links
is not created by CoreStudyTableBuilder (but CoreObjectTableBuilder creates it).
The CoreBuilder calls the methods from the CoreStudyTableBuilder.
The CoreBuilder class also calls CoreObjectTableBuilder, which creates 11 object_*
tables, 1 data_objects
table, and 1 study_object_links
. Still, there is the study_search
create table, but is not used ( Rider marks it as an unused method using the grey color for the method name) :
So CoreStudyTableBuilder and CoreObjectTableBuilder are responsible of the creation of 26 tables (13 tables each).
The other two tables present are new_search_studies
and new_search_studies_json
.
These two tables do not look to be created by the Aggregator.
A possible solution could be to import it from an older MDR version.
Looking into the logs, it is noticeable that the table disappeared since 22nd July 2024:
:~/Downloads/postgresql_download$grep -i "new_search_objects" *.log
postgresql-2024-07-22_000000.log:2024-07-22 15:55:01.701 UTC [9748] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-07-22_000000.log: inner join core.new_search_objects os
postgresql-2024-07-22_000000.log:2024-07-22 15:55:23.468 UTC [6420] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-07-22_000000.log: inner join core.new_search_objects os
postgresql-2024-08-04_000000.log:2024-08-04 15:00:03.910 UTC [12980] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-04_000000.log: inner join core.new_search_objects os
postgresql-2024-08-11_000000.log:2024-08-11 15:00:02.695 UTC [5616] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-11_000000.log: inner join core.new_search_objects os
postgresql-2024-08-25_000000.log:2024-08-25 15:00:02.915 UTC [13340] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-25_000000.log: inner join core.new_search_objects os
postgresql-2024-08-30_000000.log:2024-08-30 19:15:20.482 UTC [16832] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-30_000000.log: inner join core.new_search_objects os
postgresql-2024-08-31_000000.log:2024-08-31 17:01:54.277 UTC [6644] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-31_000000.log: inner join core.new_search_objects os
postgresql-2024-08-31_000000.log:2024-08-31 19:31:06.280 UTC [10684] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-08-31_000000.log: inner join core.new_search_objects os
postgresql-2024-09-01_000000.log:2024-09-01 15:00:02.870 UTC [4108] ERROR: relation "core.new_search_objects" does not exist at character 76
postgresql-2024-09-01_000000.log: inner join core.new_search_objects os
To track the creation and deletion of tables in the future, DDL logging can be enabled in postgresql.conf file. This way, any CREATE, DROP, ALTER operations on tables will be logged.
To do so the line of the postgresql.conf file:
#log_statement = 'none' # none, ddl, mod, all
should be changed in
log_statement = 'ddl' # Logs all DDL (Data Definition Language) commands, such as CREATE, ALTER, DROP
and the DB needs to be restarted.
@sergiocontrino Sergio, what do you think about it ?
Enabling log_statement
in PostgreSQL can provide valuable information for auditing, debugging, or tracking database activity. However, it also has potential drawbacks depending on the logging level you choose. Let’s walk through the impact of each setting and discuss the potential drawbacks.
log_statement
Optionsnone
(default): No SQL statements are logged.ddl
: Only logs Data Definition Language (DDL) statements (e.g., CREATE
, ALTER
, DROP
).mod
: Logs all ddl
statements plus Data Modification Language (DML) statements (e.g., INSERT
, UPDATE
, DELETE
, TRUNCATE
).all
: Logs all SQL statements, including SELECT
queries and DDL/DML operations.log_statement
log_statement = 'ddl'
(Logs only DDL operations)
CREATE
, ALTER
, and DROP
statements, without too much noise from DML queries.INSERT
, UPDATE
, DELETE
) that could affect data integrity or performance.Recommendation: This is generally safe for most production environments where you want to track changes to database schema without affecting performance significantly.
log_statement = 'mod'
(Logs DDL and DML operations)
INSERT
, UPDATE
, DELETE
, TRUNCATE
).INSERT
or UPDATE
operations).Recommendation: This is suitable if you need more detailed auditing, but you must have a strategy for managing log size (e.g., log rotation and archiving).
log_statement = 'all'
(Logs everything, including SELECT
queries)
SELECT
statements.SELECT
statements) can considerably affect the performance of the database, especially on read-heavy systems. Every query will be written to the log, including frequent or large SELECT
queries.SELECT
). This can result in massive log files that need to be regularly rotated and archived.CREATE TABLE
statement in a sea of SELECT
queries).Recommendation: log_statement = 'all'
should be used very carefully in production environments. It’s best suited for debugging or development environments where performance isn’t critical, or for short-term use in production if you are trying to troubleshoot a specific issue.
log_statement
Performance Overhead:
INSERT
, UPDATE
, and DELETE
statements can slow down databases with frequent data modifications.SELECT
(high impact): On read-heavy databases, logging all SELECT
statements can add significant overhead and degrade performance.Disk Space Usage:
postgresql.conf
:
log_rotation_size
: Rotate logs when they reach a certain size.log_rotation_age
: Rotate logs after a specific time period.log_truncate_on_rotation
: Truncate logs when they are rotated instead of appending.Example settings for log rotation:
log_rotation_size = '100MB' # Rotate logs when they reach 100MB
log_rotation_age = '1d' # Rotate logs daily
log_truncate_on_rotation = on # Truncate logs on rotation
Log Management:
Security Considerations:
Audit Requirements:
log_statement = 'ddl'
).log_statement = 'mod'
).Production Environments:
log_statement = 'ddl'
is a good balance between performance and visibility, as it logs all schema changes without logging every query.log_statement = 'mod'
is useful if you want to track data changes but still avoid the overhead of logging SELECT
queries.Development/Debugging:
log_statement = 'all'
. Make sure to monitor performance and log file size while doing this, and disable it after you’ve collected enough information.ddl
: Logs only schema changes. Low overhead and generally recommended for production environments to track changes like CREATE
, DROP
, and ALTER
statements.mod
: Logs schema changes and data modifications (e.g., INSERT
, UPDATE
, DELETE
). Moderate overhead, useful for auditing changes to data.all
: Logs everything, including SELECT
statements. High overhead, recommended only for temporary use in debugging scenarios.To minimize potential performance and storage issues, consider your logging needs carefully, and implement log rotation policies to manage disk space usage. Let me know if you need more specific guidance based on your use case!
I looked into the MDR Test server, but I can see that also new_search_studies
and new_search_studies_json
are not present (apart from the new_search_objects
).
I restored a backup found in MDR_Test, and I got only the following tables:
The issue is that the function that should create the core.new_search_objects
, (not hosted in the master branch), is commented
if (opts.do_indexes) // `phase 5` --> `-X`
{
// There are two aspects of setting up search data. One is to create searchable
// tables to respond to queries and filters. The other is to set up
// suitable JSON fields to return to the UI in response to those queries.
// The lup schemas from context, as well as the aggs schemas, are required.
string core_conn_string = _credentials.GetConnectionString("mdr");
List<string> aggs_schemas = new() { "st", "ob", "nk" };
_monDatalayer.SetUpTempFTWs(_credentials, core_conn_string, "core", "aggs", aggs_schemas);
List<string> ctx_schemas = new() { "lup" };
_monDatalayer.SetUpTempFTWs(_credentials, core_conn_string, "core", "context", ctx_schemas);
_loggingHelper.LogLine("FTW tables recreated");
// Initial task is to create JSON versions of the object data (as part of this will be
// incorporated into study json and tables, from where it can be returned when necessary).
// Querying and filtering is almost always done against studies rather than objects - the
// exception being PMIDs and even then it is studies that are returned.
// Preparing object data is therefore focused on creating the JSON required. The routine
// below generates both a 'full' JSON image of each object plus a much smaller JSON
// fragment that will be returned within search results.
CoreSearchBuilder csb = new CoreSearchBuilder(core_conn_string, _loggingHelper);
//_loggingHelper.LogHeader("Creating JSON object data");
// csb.CreateJSONObjectData();
// Tables are then created to hold data for querying in various ways
_loggingHelper.LogHeader("Setting up study search tables");
//csb.CreateIdentifierSearchDataTable();
//csb.CreatePMIDSearchDataTable();
//csb.CreateLexemeSearchDataTable();
//csb.CreateCountrySearchDataTable();
// The study data json objects are then created
_loggingHelper.LogHeader("Creating JSON study data");
csb.CreateJSONStudyData();
_loggingHelper.LogHeader("Creating data in search tables");
csb.AddStudyJsonToSearchTables();
csb.SwitchToNewTables();
// Drop FTW schemas.
_monDatalayer.DropTempFTWs(core_conn_string, "aggs", aggs_schemas);
_monDatalayer.DropTempFTWs(core_conn_string, "context", ctx_schemas);
_loggingHelper.LogLine("FTW tables dropped");
}
I am trying commenting out:
csb.CreateJSONObjectData();
csb.CreateIdentifierSearchDataTable();
csb.CreatePMIDSearchDataTable();
csb.CreateLexemeSearchDataTable();
csb.CreateCountrySearchDataTable();