Closed chrisjsewell closed 2 years ago
strongly agree
weakly agree
(after some deliberation I changed my initial response from "strongly agree" to "weakly agree")
weakly agree.
To go slightly against my rule of no discussion 😬 (but a key point I want to highlight):
an automatic migration from django to sqla
To a first approximation there is literally no difference with the actual postgres DB, and so no real migration needed (except syncing the schema version value), i.e. I don't see any difference in the tables/fields of the ORM classes, and also I tried:
verdi archive import "https://archive.materialscloud.org/record/file?filename=two_dimensional_database.aiida&file_id=98f1789b-d68e-46db-9c67-4ef8dd8c1ebf&record_id=648"
verdi database summary -v
SettingsManager.validate_schema_version
(because it returned the DB schema version as None
):
def validate_schema_version(self, profile):
return True
but after that...
verdi database summary -v
(works fine)verdi archive create -G 17 -- test_aiida.aiida
(works fine, i.e. re-exporting the import group)verdi process list -a -l 100
(works fine)verdi process show 4
(works fine, with correct incoming/outgoing links)verdi node attributes 4
(works fine)(I would encourage you to try with your own database)
I'm not sure that's the correct way to discuss issues - you give yourself the space to argue for yourself (multiple times) and not the others? :-) Anyway, I see you say "we can discuss further". Since already the preliminary answers indicate we might need further discussion, I'll go in discussion mode.
After some consideration, I might be OK with the option of dropping Django, but with some caveats (that I'd like to discuss in person in a brief meeting, but I summarise below).
Here's my reasoning (also to bring some historical perspective on why this was done, and avoid to fall back in the same pitfalls):
Two important questions for me remain:
In summary:
@giovannipizzi I am bit disappointed that you have chosen to specifically ignore my request; to get an initial consensus from the group on the issue, before diving in to technical discussion, but I will answer some of your points:
I'm not sure that's the correct way to discuss issues - you give yourself the space to argue for yourself
I disagree. I did not argue, I made a statement to gauge initial support (and made that clear). It is, in my experience, fruitless to go into these long, drawn out discussions, if people already have entrenched viewpoints.
we needed to support all features (JSONB in SQLite, flexible queries, ...), so both backends were needed.. but now this reason is gone
This is exactly what I already mentioned in the initial comment. I completely understand there were historical reasons, but now it is just hindering aiida rather than helping it.
one IMPORTANT note: removing the backend concept
I never said to remove the backend concept
accepting that SQLAlchemy code will most probably eventually "leak" into the whole codebase
and so this is a false equivalence
we have spent a significant effort modularising the code
Except the code is not well modularised, which is why we have to maintain two entirely separate implementations for database migrations and archive exports (for what is now exactly the same database), amongst others things, which is a large impediment for anyone trying to make any notable changes/improvements to aiida If we really want to make the backend modular, then it should be a plugin.
Thanks to our current isolation of the backend in a module, all DB-specific code is in specific place
As well, as aiida/backends
, there is aiida/orm/implementations
and aiida/tools/importexport/dbimport/backends
do we see in the foreseeable future that we extend AiiDA to more DBs/DB backends?
This is premature generalisation. If you want any of the features you mention, then I would suggest you should be writing them in an AEP and adding them to a roadmap. Besides, a key point of both sqlalchemy and django are that they are already abstraction layers for interfacing with multiple database APIs.
Just to bring a specific example: a few years ago, most command line commands were directly accessing the DB via Django or SQLAlchemy queries, and it took quite some time to convert them all to QueryBuilder.
Again, I am not asking for the backend concept to be removed, but we have now gone to the opposite extreme of having multiple layers and implementations of abstraction
the point of backend abstraction should be to allow for plugging in multiple types of database, not having two semi-hardcoded implementations of accessing exactly the same database, with exactly the same schema
Cross-posting a comment here from https://github.com/aiidateam/aiida-restapi/issues/26#issuecomment-861421127, which I think is relevant to the backend abstraction/modularization discussion
TBH its made it a bit "confusing", that you have the AiiDA ORM classes, which are not technically the same as the backend sqlalchemy/django ORM classes, but that with the QueryBuilder you specify / get back the fields of the "backend" ORM classes? (the classic being this annoying translation of
id
->pk
, and alsoname
->label
on computers)
basically, IMO, the QueryBuilder
feels a little "off" at present, because it seems to expose aspects of the backend ORM, rather than the frontend ORM
Another possible thing to bear in mind when considering databases as a plugin, is how much should they be tied to the object store? As well as the node repository file structure now being stored in the DB, from https://github.com/aiidateam/aiida-core/commit/87ab7e3a47fe64d642d88b9f1e31e86714be6a2c, the repository is now directly initialised within the database migrations. So presumably a new DB backend would need at least one "faux" migration to initialise the repository.
From a meeting with the team, the approach may be:
Ensure schemas are completely aligned, or what needs to change (implement unit test to directly compare schemas)
Think about the deprecation/migration pathway would be; at what aiida-core versions do we implement the migrations and deprecations (possibly implement part in 1.x, to enable a drop in 2.x)
develop
We may well look to largely remove the current backend design (as @giovannipizzi currently there is currently a lot of indirection in the code, particularly for QueryBuilder
, which makes the code difficult to understand)
QueryBuilder
instead)sqlalchemy considerations:
...
Comparison of schemas, using:
import tabulate
from aiida import load_profile
load_profile()
from aiida.manage.manager import get_manager
backend = get_manager().get_backend()
# https://www.postgresql.org/docs/12/infoschema-tables.html
tables_columns = ("table_type", "is_typed")
table_rows = backend.execute_raw(
f"SELECT table_name,{','.join(tables_columns)} FROM information_schema.tables WHERE table_schema = 'public'"
)
print(
tabulate.tabulate(sorted([list(r) for r in table_rows]),
headers=('table_name', *tables_columns)))
table_dict = {row[0]: dict(zip(tables_columns, row[1:])) for row in table_rows}
# https://www.postgresql.org/docs/12/infoschema-columns.html
field_columns = ("data_type", "is_nullable", "is_identity",
"character_maximum_length", "column_default")
convert = {"character_maximum_length": "char_max_len"}
for table_name in sorted(table_dict):
header = f"\nTable: {table_name}"
print(f"{header}\n{'='*len(header)}")
field_rows = backend.execute_raw(
f"SELECT column_name,{','.join(field_columns)} FROM information_schema.columns WHERE table_schema = 'public' AND table_name = '{table_name}'"
)
print(
tabulate.tabulate(sorted([list(r) for r in field_rows]),
headers=('column_name', *(convert.get(f, f)
for f in field_columns))))
Django:
table_name table_type is_typed
---------------------- ------------ ----------
auth_group BASE TABLE NO
auth_group_permissions BASE TABLE NO
auth_permission BASE TABLE NO
db_dbauthinfo BASE TABLE NO
db_dbcomment BASE TABLE NO
db_dbcomputer BASE TABLE NO
db_dbgroup BASE TABLE NO
db_dbgroup_dbnodes BASE TABLE NO
db_dblink BASE TABLE NO
db_dblog BASE TABLE NO
db_dbnode BASE TABLE NO
db_dbsetting BASE TABLE NO
db_dbuser BASE TABLE NO
django_content_type BASE TABLE NO
django_migrations BASE TABLE NO
Table: auth_group
==================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- --------------------------------------
id integer NO NO nextval('auth_group_id_seq'::regclass)
name character varying NO NO 150
Table: auth_group_permissions
==============================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------- ------------- ------------- -------------- --------------------------------------------------
group_id integer NO NO
id integer NO NO nextval('auth_group_permissions_id_seq'::regclass)
permission_id integer NO NO
Table: auth_permission
=======================
column_name data_type is_nullable is_identity char_max_len column_default
--------------- ----------------- ------------- ------------- -------------- -------------------------------------------
codename character varying NO NO 100
content_type_id integer NO NO
id integer NO NO nextval('auth_permission_id_seq'::regclass)
name character varying NO NO 255
Table: db_dbauthinfo
=====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------- ------------- ------------- -------------- -----------------------------------------
aiidauser_id integer NO NO
auth_params jsonb NO NO
dbcomputer_id integer NO NO
enabled boolean NO NO
id integer NO NO nextval('db_dbauthinfo_id_seq'::regclass)
metadata jsonb NO NO
Table: db_dbcomment
====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ----------------------------------------
content text NO NO
ctime timestamp with time zone NO NO
dbnode_id integer NO NO
id integer NO NO nextval('db_dbcomment_id_seq'::regclass)
mtime timestamp with time zone NO NO
user_id integer NO NO
uuid uuid NO NO
Table: db_dbcomputer
=====================
column_name data_type is_nullable is_identity char_max_len column_default
-------------- ----------------- ------------- ------------- -------------- -----------------------------------------
description text NO NO
hostname character varying NO NO 255
id integer NO NO nextval('db_dbcomputer_id_seq'::regclass)
label character varying NO NO 255
metadata jsonb NO NO
scheduler_type character varying NO NO 255
transport_type character varying NO NO 255
uuid uuid NO NO
Table: db_dbgroup
==================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- --------------------------------------
description text NO NO
extras jsonb NO NO
id integer NO NO nextval('db_dbgroup_id_seq'::regclass)
label character varying NO NO 255
time timestamp with time zone NO NO
type_string character varying NO NO 255
user_id integer NO NO
uuid uuid NO NO
Table: db_dbgroup_dbnodes
==========================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------- ------------- ------------- -------------- ----------------------------------------------
dbgroup_id integer NO NO
dbnode_id integer NO NO
id integer NO NO nextval('db_dbgroup_dbnodes_id_seq'::regclass)
Table: db_dblink
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- -------------------------------------
id integer NO NO nextval('db_dblink_id_seq'::regclass)
input_id integer NO NO
label character varying NO NO 255
output_id integer NO NO
type character varying NO NO 255
Table: db_dblog
================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ------------------------------------
dbnode_id integer NO NO
id integer NO NO nextval('db_dblog_id_seq'::regclass)
levelname character varying NO NO 50
loggername character varying NO NO 255
message text NO NO
metadata jsonb NO NO
time timestamp with time zone NO NO
uuid uuid NO NO
Table: db_dbnode
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------------- ------------------------ ------------- ------------- -------------- -------------------------------------
attributes jsonb YES NO
ctime timestamp with time zone NO NO
dbcomputer_id integer YES NO
description text NO NO
extras jsonb YES NO
id integer NO NO nextval('db_dbnode_id_seq'::regclass)
label character varying NO NO 255
mtime timestamp with time zone NO NO
node_type character varying NO NO 255
process_type character varying YES NO 255
repository_metadata jsonb YES NO
user_id integer NO NO
uuid uuid NO NO
Table: db_dbsetting
====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ----------------------------------------
description text NO NO
id integer NO NO nextval('db_dbsetting_id_seq'::regclass)
key character varying NO NO 1024
time timestamp with time zone NO NO
val jsonb YES NO
Table: db_dbuser
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- -------------------------------------
email character varying NO NO 254
first_name character varying NO NO 254
id integer NO NO nextval('db_dbuser_id_seq'::regclass)
institution character varying NO NO 254
last_name character varying NO NO 254
Table: django_content_type
===========================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- -----------------------------------------------
app_label character varying NO NO 100
id integer NO NO nextval('django_content_type_id_seq'::regclass)
model character varying NO NO 100
Table: django_migrations
=========================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ---------------------------------------------
app character varying NO NO 255
applied timestamp with time zone NO NO
id integer NO NO nextval('django_migrations_id_seq'::regclass)
name character varying NO NO 255
SQLAlchemy:
table_name table_type is_typed
------------------ ------------ ----------
alembic_version BASE TABLE NO
db_dbauthinfo BASE TABLE NO
db_dbcomment BASE TABLE NO
db_dbcomputer BASE TABLE NO
db_dbgroup BASE TABLE NO
db_dbgroup_dbnodes BASE TABLE NO
db_dblink BASE TABLE NO
db_dblog BASE TABLE NO
db_dbnode BASE TABLE NO
db_dbsetting BASE TABLE NO
db_dbuser BASE TABLE NO
Table: alembic_version
=======================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- ----------------
version_num character varying NO NO 32
Table: db_dbauthinfo
=====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------- ------------- ------------- -------------- -----------------------------------------
aiidauser_id integer YES NO
auth_params jsonb YES NO
dbcomputer_id integer YES NO
enabled boolean YES NO
id integer NO NO nextval('db_dbauthinfo_id_seq'::regclass)
metadata jsonb YES NO
Table: db_dbcomment
====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ----------------------------------------
content text YES NO
ctime timestamp with time zone YES NO
dbnode_id integer YES NO
id integer NO NO nextval('db_dbcomment_id_seq'::regclass)
mtime timestamp with time zone YES NO
user_id integer YES NO
uuid uuid YES NO
Table: db_dbcomputer
=====================
column_name data_type is_nullable is_identity char_max_len column_default
-------------- ----------------- ------------- ------------- -------------- -----------------------------------------
description text YES NO
hostname character varying YES NO 255
id integer NO NO nextval('db_dbcomputer_id_seq'::regclass)
label character varying NO NO 255
metadata jsonb YES NO
scheduler_type character varying YES NO 255
transport_type character varying YES NO 255
uuid uuid YES NO
Table: db_dbgroup
==================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- --------------------------------------
description text YES NO
extras jsonb NO NO
id integer NO NO nextval('db_dbgroup_id_seq'::regclass)
label character varying YES NO 255
time timestamp with time zone YES NO
type_string character varying YES NO 255
user_id integer YES NO
uuid uuid YES NO
Table: db_dbgroup_dbnodes
==========================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------- ------------- ------------- -------------- ----------------------------------------------
dbgroup_id integer YES NO
dbnode_id integer YES NO
id integer NO NO nextval('db_dbgroup_dbnodes_id_seq'::regclass)
Table: db_dblink
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- -------------------------------------
id integer NO NO nextval('db_dblink_id_seq'::regclass)
input_id integer YES NO
label character varying NO NO 255
output_id integer YES NO
type character varying YES NO 255
Table: db_dblog
================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ------------------------------------
dbnode_id integer NO NO
id integer NO NO nextval('db_dblog_id_seq'::regclass)
levelname character varying YES NO 255
loggername character varying YES NO 255
message text YES NO
metadata jsonb YES NO
time timestamp with time zone YES NO
uuid uuid YES NO
Table: db_dbnode
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------------- ------------------------ ------------- ------------- -------------- -------------------------------------
attributes jsonb YES NO
ctime timestamp with time zone YES NO
dbcomputer_id integer YES NO
description text YES NO
extras jsonb YES NO
id integer NO NO nextval('db_dbnode_id_seq'::regclass)
label character varying YES NO 255
mtime timestamp with time zone YES NO
node_type character varying YES NO 255
process_type character varying YES NO 255
repository_metadata jsonb NO NO '{}'::jsonb
user_id integer NO NO
uuid uuid YES NO
Table: db_dbsetting
====================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ------------------------ ------------- ------------- -------------- ----------------------------------------
description character varying NO NO 255
id integer NO NO nextval('db_dbsetting_id_seq'::regclass)
key character varying NO NO 255
time timestamp with time zone YES NO
val jsonb YES NO
Table: db_dbuser
=================
column_name data_type is_nullable is_identity char_max_len column_default
------------- ----------------- ------------- ------------- -------------- -------------------------------------
email character varying YES NO 254
first_name character varying YES NO 254
id integer NO NO nextval('db_dbuser_id_seq'::regclass)
institution character varying YES NO 254
last_name character varying YES NO 254
Principally all the db_*
tables have identical column_names and data types (except db_dbsetting.description
being "incorrectly" char in sqlalchemy), then there are a few differences in char_max_len
(db_dbsetting.key
, db_dblog.levelname
) then the main difference is that different fields are nullable.
If converting from django to sqlalchemy, I assume these tables can just be silently ignore/removed:
$ verdi devel run-sql "SELECT * FROM auth_group;"
$ verdi devel run-sql "SELECT * FROM auth_group_permissions;"
$ verdi devel run-sql "SELECT * FROM auth_permission;"
(1, 'Can add permission', 1, 'add_permission')
(2, 'Can change permission', 1, 'change_permission')
(3, 'Can delete permission', 1, 'delete_permission')
(4, 'Can view permission', 1, 'view_permission')
(5, 'Can add group', 2, 'add_group')
(6, 'Can change group', 2, 'change_group')
(7, 'Can delete group', 2, 'delete_group')
(8, 'Can view group', 2, 'view_group')
(9, 'Can add content type', 3, 'add_contenttype')
(10, 'Can change content type', 3, 'change_contenttype')
(11, 'Can delete content type', 3, 'delete_contenttype')
(12, 'Can view content type', 3, 'view_contenttype')
(13, 'Can add db user', 4, 'add_dbuser')
(14, 'Can change db user', 4, 'change_dbuser')
(15, 'Can delete db user', 4, 'delete_dbuser')
(16, 'Can view db user', 4, 'view_dbuser')
(17, 'Can add db auth info', 5, 'add_dbauthinfo')
(18, 'Can change db auth info', 5, 'change_dbauthinfo')
(19, 'Can delete db auth info', 5, 'delete_dbauthinfo')
(20, 'Can view db auth info', 5, 'view_dbauthinfo')
(21, 'Can add db comment', 6, 'add_dbcomment')
(22, 'Can change db comment', 6, 'change_dbcomment')
(23, 'Can delete db comment', 6, 'delete_dbcomment')
(24, 'Can view db comment', 6, 'view_dbcomment')
(25, 'Can add db computer', 7, 'add_dbcomputer')
(26, 'Can change db computer', 7, 'change_dbcomputer')
(27, 'Can delete db computer', 7, 'delete_dbcomputer')
(28, 'Can view db computer', 7, 'view_dbcomputer')
(29, 'Can add db group', 8, 'add_dbgroup')
(30, 'Can change db group', 8, 'change_dbgroup')
(31, 'Can delete db group', 8, 'delete_dbgroup')
(32, 'Can view db group', 8, 'view_dbgroup')
(33, 'Can add db link', 9, 'add_dblink')
(34, 'Can change db link', 9, 'change_dblink')
(35, 'Can delete db link', 9, 'delete_dblink')
(36, 'Can view db link', 9, 'view_dblink')
(37, 'Can add db log', 10, 'add_dblog')
(38, 'Can change db log', 10, 'change_dblog')
(39, 'Can delete db log', 10, 'delete_dblog')
(40, 'Can view db log', 10, 'view_dblog')
(41, 'Can add db node', 11, 'add_dbnode')
(42, 'Can change db node', 11, 'change_dbnode')
(43, 'Can delete db node', 11, 'delete_dbnode')
(44, 'Can view db node', 11, 'view_dbnode')
(45, 'Can add db setting', 12, 'add_dbsetting')
(46, 'Can change db setting', 12, 'change_dbsetting')
(47, 'Can delete db setting', 12, 'delete_dbsetting')
(48, 'Can view db setting', 12, 'view_dbsetting')
and you could just directly write the alembic_version
table:
$ verdi -p new_sqla devel run-sql "SELECT * FROM alembic_version;"
('535039300e4a',)
Comparison of indexes:
Django:
$ verdi -p new_django devel run-sql "SELECT tablename,indexname,indexdef FROM pg_indexes WHERE schemaname = 'public' ORDER BY tablename,indexname;"
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_0684fdfb', 'CREATE INDEX db_dbauthinfo_aiidauser_id_0684fdfb ON public.db_dbauthinfo USING btree (aiidauser_id)')
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_dbcomputer_id_777cdaa8_uniq', 'CREATE UNIQUE INDEX db_dbauthinfo_aiidauser_id_dbcomputer_id_777cdaa8_uniq ON public.db_dbauthinfo USING btree (aiidauser_id, dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_dbcomputer_id_424f7ac4', 'CREATE INDEX db_dbauthinfo_dbcomputer_id_424f7ac4 ON public.db_dbauthinfo USING btree (dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_pkey', 'CREATE UNIQUE INDEX db_dbauthinfo_pkey ON public.db_dbauthinfo USING btree (id)')
('db_dbcomment', 'db_dbcomment_dbnode_id_3b812b6b', 'CREATE INDEX db_dbcomment_dbnode_id_3b812b6b ON public.db_dbcomment USING btree (dbnode_id)')
('db_dbcomment', 'db_dbcomment_pkey', 'CREATE UNIQUE INDEX db_dbcomment_pkey ON public.db_dbcomment USING btree (id)')
('db_dbcomment', 'db_dbcomment_user_id_8ed5e360', 'CREATE INDEX db_dbcomment_user_id_8ed5e360 ON public.db_dbcomment USING btree (user_id)')
('db_dbcomment', 'db_dbcomment_uuid_49bac08c_uniq', 'CREATE UNIQUE INDEX db_dbcomment_uuid_49bac08c_uniq ON public.db_dbcomment USING btree (uuid)')
('db_dbcomputer', 'db_dbcomputer_name_f1800b1a_like', 'CREATE INDEX db_dbcomputer_name_f1800b1a_like ON public.db_dbcomputer USING btree (label varchar_pattern_ops)')
('db_dbcomputer', 'db_dbcomputer_name_key', 'CREATE UNIQUE INDEX db_dbcomputer_name_key ON public.db_dbcomputer USING btree (label)')
('db_dbcomputer', 'db_dbcomputer_pkey', 'CREATE UNIQUE INDEX db_dbcomputer_pkey ON public.db_dbcomputer USING btree (id)')
('db_dbcomputer', 'db_dbcomputer_uuid_f35defa6_uniq', 'CREATE UNIQUE INDEX db_dbcomputer_uuid_f35defa6_uniq ON public.db_dbcomputer USING btree (uuid)')
('db_dbgroup', 'db_dbgroup_name_66c75272', 'CREATE INDEX db_dbgroup_name_66c75272 ON public.db_dbgroup USING btree (label)')
('db_dbgroup', 'db_dbgroup_name_66c75272_like', 'CREATE INDEX db_dbgroup_name_66c75272_like ON public.db_dbgroup USING btree (label varchar_pattern_ops)')
('db_dbgroup', 'db_dbgroup_name_type_12656f33_uniq', 'CREATE UNIQUE INDEX db_dbgroup_name_type_12656f33_uniq ON public.db_dbgroup USING btree (label, type_string)')
('db_dbgroup', 'db_dbgroup_pkey', 'CREATE UNIQUE INDEX db_dbgroup_pkey ON public.db_dbgroup USING btree (id)')
('db_dbgroup', 'db_dbgroup_type_23b2a748', 'CREATE INDEX db_dbgroup_type_23b2a748 ON public.db_dbgroup USING btree (type_string)')
('db_dbgroup', 'db_dbgroup_type_23b2a748_like', 'CREATE INDEX db_dbgroup_type_23b2a748_like ON public.db_dbgroup USING btree (type_string varchar_pattern_ops)')
('db_dbgroup', 'db_dbgroup_user_id_100f8a51', 'CREATE INDEX db_dbgroup_user_id_100f8a51 ON public.db_dbgroup USING btree (user_id)')
('db_dbgroup', 'db_dbgroup_uuid_af896177_uniq', 'CREATE UNIQUE INDEX db_dbgroup_uuid_af896177_uniq ON public.db_dbgroup USING btree (uuid)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_9d3a0f9d', 'CREATE INDEX db_dbgroup_dbnodes_dbgroup_id_9d3a0f9d ON public.db_dbgroup_dbnodes USING btree (dbgroup_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_dbnode_id_eee23cce_uniq', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_dbgroup_id_dbnode_id_eee23cce_uniq ON public.db_dbgroup_dbnodes USING btree (dbgroup_id, dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbnode_id_118b9439', 'CREATE INDEX db_dbgroup_dbnodes_dbnode_id_118b9439 ON public.db_dbgroup_dbnodes USING btree (dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_pkey', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_pkey ON public.db_dbgroup_dbnodes USING btree (id)')
('db_dblink', 'db_dblink_input_id_9245bd73', 'CREATE INDEX db_dblink_input_id_9245bd73 ON public.db_dblink USING btree (input_id)')
('db_dblink', 'db_dblink_label_f1343cfb', 'CREATE INDEX db_dblink_label_f1343cfb ON public.db_dblink USING btree (label)')
('db_dblink', 'db_dblink_label_f1343cfb_like', 'CREATE INDEX db_dblink_label_f1343cfb_like ON public.db_dblink USING btree (label varchar_pattern_ops)')
('db_dblink', 'db_dblink_output_id_c0167528', 'CREATE INDEX db_dblink_output_id_c0167528 ON public.db_dblink USING btree (output_id)')
('db_dblink', 'db_dblink_pkey', 'CREATE UNIQUE INDEX db_dblink_pkey ON public.db_dblink USING btree (id)')
('db_dblink', 'db_dblink_type_229f212b', 'CREATE INDEX db_dblink_type_229f212b ON public.db_dblink USING btree (type)')
('db_dblink', 'db_dblink_type_229f212b_like', 'CREATE INDEX db_dblink_type_229f212b_like ON public.db_dblink USING btree (type varchar_pattern_ops)')
('db_dblog', 'db_dblog_dbnode_id_da34b732', 'CREATE INDEX db_dblog_dbnode_id_da34b732 ON public.db_dblog USING btree (dbnode_id)')
('db_dblog', 'db_dblog_levelname_ad5dc346', 'CREATE INDEX db_dblog_levelname_ad5dc346 ON public.db_dblog USING btree (levelname)')
('db_dblog', 'db_dblog_levelname_ad5dc346_like', 'CREATE INDEX db_dblog_levelname_ad5dc346_like ON public.db_dblog USING btree (levelname varchar_pattern_ops)')
('db_dblog', 'db_dblog_loggername_00b5ba16', 'CREATE INDEX db_dblog_loggername_00b5ba16 ON public.db_dblog USING btree (loggername)')
('db_dblog', 'db_dblog_loggername_00b5ba16_like', 'CREATE INDEX db_dblog_loggername_00b5ba16_like ON public.db_dblog USING btree (loggername varchar_pattern_ops)')
('db_dblog', 'db_dblog_pkey', 'CREATE UNIQUE INDEX db_dblog_pkey ON public.db_dblog USING btree (id)')
('db_dblog', 'db_dblog_uuid_9cf77df3_uniq', 'CREATE UNIQUE INDEX db_dblog_uuid_9cf77df3_uniq ON public.db_dblog USING btree (uuid)')
('db_dbnode', 'db_dbnode_ctime_71626ef5', 'CREATE INDEX db_dbnode_ctime_71626ef5 ON public.db_dbnode USING btree (ctime)')
('db_dbnode', 'db_dbnode_dbcomputer_id_315372a3', 'CREATE INDEX db_dbnode_dbcomputer_id_315372a3 ON public.db_dbnode USING btree (dbcomputer_id)')
('db_dbnode', 'db_dbnode_label_6469539e', 'CREATE INDEX db_dbnode_label_6469539e ON public.db_dbnode USING btree (label)')
('db_dbnode', 'db_dbnode_label_6469539e_like', 'CREATE INDEX db_dbnode_label_6469539e_like ON public.db_dbnode USING btree (label varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_mtime_0554ea3d', 'CREATE INDEX db_dbnode_mtime_0554ea3d ON public.db_dbnode USING btree (mtime)')
('db_dbnode', 'db_dbnode_pkey', 'CREATE UNIQUE INDEX db_dbnode_pkey ON public.db_dbnode USING btree (id)')
('db_dbnode', 'db_dbnode_process_type_df7298d0', 'CREATE INDEX db_dbnode_process_type_df7298d0 ON public.db_dbnode USING btree (process_type)')
('db_dbnode', 'db_dbnode_process_type_df7298d0_like', 'CREATE INDEX db_dbnode_process_type_df7298d0_like ON public.db_dbnode USING btree (process_type varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_type_a8ce9753', 'CREATE INDEX db_dbnode_type_a8ce9753 ON public.db_dbnode USING btree (node_type)')
('db_dbnode', 'db_dbnode_type_a8ce9753_like', 'CREATE INDEX db_dbnode_type_a8ce9753_like ON public.db_dbnode USING btree (node_type varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_user_id_12e7aeaf', 'CREATE INDEX db_dbnode_user_id_12e7aeaf ON public.db_dbnode USING btree (user_id)')
('db_dbnode', 'db_dbnode_uuid_62e0bf98_uniq', 'CREATE UNIQUE INDEX db_dbnode_uuid_62e0bf98_uniq ON public.db_dbnode USING btree (uuid)')
('db_dbsetting', 'db_dbsetting_key_1b84beb4_like', 'CREATE INDEX db_dbsetting_key_1b84beb4_like ON public.db_dbsetting USING btree (key varchar_pattern_ops)')
('db_dbsetting', 'db_dbsetting_key_1b84beb4_uniq', 'CREATE UNIQUE INDEX db_dbsetting_key_1b84beb4_uniq ON public.db_dbsetting USING btree (key)')
('db_dbsetting', 'db_dbsetting_pkey', 'CREATE UNIQUE INDEX db_dbsetting_pkey ON public.db_dbsetting USING btree (id)')
('db_dbuser', 'db_dbuser_email_30150b7e_like', 'CREATE INDEX db_dbuser_email_30150b7e_like ON public.db_dbuser USING btree (email varchar_pattern_ops)')
('db_dbuser', 'db_dbuser_email_30150b7e_uniq', 'CREATE UNIQUE INDEX db_dbuser_email_30150b7e_uniq ON public.db_dbuser USING btree (email)')
('db_dbuser', 'db_dbuser_pkey', 'CREATE UNIQUE INDEX db_dbuser_pkey ON public.db_dbuser USING btree (id)')
SQLAlchemy:
$ verdi -p new_sqla devel run-sql "SELECT tablename,indexname,indexdef FROM pg_indexes WHERE schemaname = 'public' ORDER BY tablename,indexname;"
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_dbcomputer_id_key', 'CREATE UNIQUE INDEX db_dbauthinfo_aiidauser_id_dbcomputer_id_key ON public.db_dbauthinfo USING btree (aiidauser_id, dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_pkey', 'CREATE UNIQUE INDEX db_dbauthinfo_pkey ON public.db_dbauthinfo USING btree (id)')
('db_dbcomment', 'db_dbcomment_pkey', 'CREATE UNIQUE INDEX db_dbcomment_pkey ON public.db_dbcomment USING btree (id)')
('db_dbcomment', 'db_dbcomment_uuid_key', 'CREATE UNIQUE INDEX db_dbcomment_uuid_key ON public.db_dbcomment USING btree (uuid)')
('db_dbcomputer', 'db_dbcomputer_label_key', 'CREATE UNIQUE INDEX db_dbcomputer_label_key ON public.db_dbcomputer USING btree (label)')
('db_dbcomputer', 'db_dbcomputer_pkey', 'CREATE UNIQUE INDEX db_dbcomputer_pkey ON public.db_dbcomputer USING btree (id)')
('db_dbcomputer', 'db_dbcomputer_uuid_key', 'CREATE UNIQUE INDEX db_dbcomputer_uuid_key ON public.db_dbcomputer USING btree (uuid)')
('db_dbgroup', 'db_dbgroup_label_type_string_key', 'CREATE UNIQUE INDEX db_dbgroup_label_type_string_key ON public.db_dbgroup USING btree (label, type_string)')
('db_dbgroup', 'db_dbgroup_pkey', 'CREATE UNIQUE INDEX db_dbgroup_pkey ON public.db_dbgroup USING btree (id)')
('db_dbgroup', 'db_dbgroup_uuid_key', 'CREATE UNIQUE INDEX db_dbgroup_uuid_key ON public.db_dbgroup USING btree (uuid)')
('db_dbgroup', 'ix_db_dbgroup_label', 'CREATE INDEX ix_db_dbgroup_label ON public.db_dbgroup USING btree (label)')
('db_dbgroup', 'ix_db_dbgroup_type_string', 'CREATE INDEX ix_db_dbgroup_type_string ON public.db_dbgroup USING btree (type_string)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key ON public.db_dbgroup_dbnodes USING btree (dbgroup_id, dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_idx', 'CREATE INDEX db_dbgroup_dbnodes_dbgroup_id_idx ON public.db_dbgroup_dbnodes USING btree (dbgroup_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbnode_id_idx', 'CREATE INDEX db_dbgroup_dbnodes_dbnode_id_idx ON public.db_dbgroup_dbnodes USING btree (dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_pkey', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_pkey ON public.db_dbgroup_dbnodes USING btree (id)')
('db_dblink', 'db_dblink_pkey', 'CREATE UNIQUE INDEX db_dblink_pkey ON public.db_dblink USING btree (id)')
('db_dblink', 'ix_db_dblink_input_id', 'CREATE INDEX ix_db_dblink_input_id ON public.db_dblink USING btree (input_id)')
('db_dblink', 'ix_db_dblink_label', 'CREATE INDEX ix_db_dblink_label ON public.db_dblink USING btree (label)')
('db_dblink', 'ix_db_dblink_output_id', 'CREATE INDEX ix_db_dblink_output_id ON public.db_dblink USING btree (output_id)')
('db_dblink', 'ix_db_dblink_type', 'CREATE INDEX ix_db_dblink_type ON public.db_dblink USING btree (type)')
('db_dblog', 'db_dblog_pkey', 'CREATE UNIQUE INDEX db_dblog_pkey ON public.db_dblog USING btree (id)')
('db_dblog', 'db_dblog_uuid_key', 'CREATE UNIQUE INDEX db_dblog_uuid_key ON public.db_dblog USING btree (uuid)')
('db_dblog', 'ix_db_dblog_levelname', 'CREATE INDEX ix_db_dblog_levelname ON public.db_dblog USING btree (levelname)')
('db_dblog', 'ix_db_dblog_loggername', 'CREATE INDEX ix_db_dblog_loggername ON public.db_dblog USING btree (loggername)')
('db_dbnode', 'db_dbnode_pkey', 'CREATE UNIQUE INDEX db_dbnode_pkey ON public.db_dbnode USING btree (id)')
('db_dbnode', 'db_dbnode_uuid_key', 'CREATE UNIQUE INDEX db_dbnode_uuid_key ON public.db_dbnode USING btree (uuid)')
('db_dbnode', 'ix_db_dbnode_label', 'CREATE INDEX ix_db_dbnode_label ON public.db_dbnode USING btree (label)')
('db_dbnode', 'ix_db_dbnode_node_type', 'CREATE INDEX ix_db_dbnode_node_type ON public.db_dbnode USING btree (node_type)')
('db_dbnode', 'ix_db_dbnode_process_type', 'CREATE INDEX ix_db_dbnode_process_type ON public.db_dbnode USING btree (process_type)')
('db_dbsetting', 'db_dbsetting_key_key', 'CREATE UNIQUE INDEX db_dbsetting_key_key ON public.db_dbsetting USING btree (key)')
('db_dbsetting', 'db_dbsetting_pkey', 'CREATE UNIQUE INDEX db_dbsetting_pkey ON public.db_dbsetting USING btree (id)')
('db_dbsetting', 'ix_db_dbsetting_key', 'CREATE INDEX ix_db_dbsetting_key ON public.db_dbsetting USING btree (key)')
('db_dbuser', 'db_dbuser_pkey', 'CREATE UNIQUE INDEX db_dbuser_pkey ON public.db_dbuser USING btree (id)')
('db_dbuser', 'ix_db_dbuser_email', 'CREATE UNIQUE INDEX ix_db_dbuser_email ON public.db_dbuser USING btree (email)')
So django creates an additional varchar_pattern_ops
index for every indexed CharField
(see https://www.postgresql.org/docs/9.4/indexes-opclass.html).
Don't know if this is desirable for evey one of these, maybe just fields where you would actually do a lot of like
queries for, e.g. node_type
, process_type
, type_string
Ah I see @giovannipizzi already opened an issue about this #2303
Adding these with alembic is noted here: https://stackoverflow.com/a/52689029/5033292
Thanks Chris for looking into this. Indeed, the auth_*
tables are automatically generated by Django but we don't use them (there might also be a few more, maybe?).
For the indices, indeed it would be good to uniform the two as mentioned in #2303, but also with an eye to which ones are really relevant to avoid slowdowns and used disk space with the indices never really used - see e.g. this discussion #2762 that also reports how the indices are used (those queries, however, need to be run on big production DBs, e.g. the Materials Cloud ones or the ones of @sphuber and @mbercx - in small DBs, PSQL might always decide not to use the index as a scan is always faster). For this issue I would just focus on the differences between the two schemas, and we can defer a more detailed analysis of common indexes to #2762 and a decision to drop some of them requires benchmarking
yep cheers @giovannipizzi, just figured out how to add them to the ORM classes and migrations, draft PR opened; #5097
Indeed, the auth_* tables are automatically generated by Django but we don't use them (there might also be a few more, maybe?).
FYI, these auh_*
and django_content_type
tables are generated by https://docs.djangoproject.com/en/3.2/ref/contrib/auth/ and https://docs.djangoproject.com/en/3.2/ref/contrib/contenttypes/, set in: https://github.com/aiidateam/aiida-core/blob/4174e5de3adbeec785290a02a0fc78d4597e42e0/aiida/backends/djsite/settings.py#L103-L105
Given these lines haven't been touched for 9 years, and we don't use either, I'm guessing they are both just legacy things
They are required for typical Django applications that are web applications, but we never used it as such. We only used their ORM. So yeah, we can get rid of this.
Ok I am going to make a very unnuanced statement to gauge opinion, and would appreciate if you simply respond with: "strongly agree", "weakly agree", "no opinion", "weakly disagree" or "strong disagree"; no discussion, no reasoning. If the majority of responses are "strongly disagree", we will close this issue and never speak of it again, if not we can discuss further:
There were legacy reasons for maintaining both django and sqlalchemy (e.g. JSONB), but the reasons have all now gone, and so the only thing it serves to do is effectively doubles the development, testing and maintenance costs in many key areas of aiida-core. We can't drop sqlalchemy, since it is the only backend for which the QueryBuilder is/can be implemented (i.e. even when you select django as the backend, you are still calling sqlalchemy for queries), so it has to be django. It has to be now or never, whilst we are moving from v1 to v2, and so breaking changes are allowed. It would be relatively straight-foward to do, and would have minimal impact for most developers/users.
cc @giovannipizzi @sphuber @ltalirz @ramirezfranciscof @CasperWA @mbercx @csadorf