aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
436 stars 191 forks source link

Drop Django Backend #4985

Closed chrisjsewell closed 2 years ago

chrisjsewell commented 3 years ago

Ok I am going to make a very unnuanced statement to gauge opinion, and would appreciate if you simply respond with: "strongly agree", "weakly agree", "no opinion", "weakly disagree" or "strong disagree"; no discussion, no reasoning. If the majority of responses are "strongly disagree", we will close this issue and never speak of it again, if not we can discuss further:

There were legacy reasons for maintaining both django and sqlalchemy (e.g. JSONB), but the reasons have all now gone, and so the only thing it serves to do is effectively doubles the development, testing and maintenance costs in many key areas of aiida-core. We can't drop sqlalchemy, since it is the only backend for which the QueryBuilder is/can be implemented (i.e. even when you select django as the backend, you are still calling sqlalchemy for queries), so it has to be django. It has to be now or never, whilst we are moving from v1 to v2, and so breaking changes are allowed. It would be relatively straight-foward to do, and would have minimal impact for most developers/users.

cc @giovannipizzi @sphuber @ltalirz @ramirezfranciscof @CasperWA @mbercx @csadorf

mbercx commented 3 years ago

strongly agree

Additional comment No discussion, no reasoning? I like this game 😁
csadorf commented 3 years ago

weakly agree

(after some deliberation I changed my initial response from "strongly agree" to "weakly agree")

ltalirz commented 3 years ago

weakly agree.

Additional comment To be clear, I'm assuming the question to be: do you support dropping the django backend before the v2 release if it can be done without much effort and with minimal impact for most users? "Minimal impact for most users" to me means an automatic migration from django to sqla (that would need to be demonstrated on a large django database).
chrisjsewell commented 3 years ago

To go slightly against my rule of no discussion 😬 (but a key point I want to highlight):

an automatic migration from django to sqla

To a first approximation there is literally no difference with the actual postgres DB, and so no real migration needed (except syncing the schema version value), i.e. I don't see any difference in the tables/fields of the ORM classes, and also I tried:

  1. Create a new profile (with the default "django" backend)
  2. verdi archive import "https://archive.materialscloud.org/record/file?filename=two_dimensional_database.aiida&file_id=98f1789b-d68e-46db-9c67-4ef8dd8c1ebf&record_id=648"
  3. verdi database summary -v
  4. change the backend to "sqlalchemy" in the config.json
  5. I had to override SettingsManager.validate_schema_version (because it returned the DB schema version as None):
    def validate_schema_version(self, profile):
      return True

    but after that...

  6. verdi database summary -v (works fine)
  7. verdi archive create -G 17 -- test_aiida.aiida (works fine, i.e. re-exporting the import group)
  8. verdi process list -a -l 100 (works fine)
  9. verdi process show 4 (works fine, with correct incoming/outgoing links)
  10. verdi node attributes 4 (works fine)

(I would encourage you to try with your own database)

giovannipizzi commented 3 years ago

I'm not sure that's the correct way to discuss issues - you give yourself the space to argue for yourself (multiple times) and not the others? :-) Anyway, I see you say "we can discuss further". Since already the preliminary answers indicate we might need further discussion, I'll go in discussion mode.

After some consideration, I might be OK with the option of dropping Django, but with some caveats (that I'd like to discuss in person in a brief meeting, but I summarise below).

Here's my reasoning (also to bring some historical perspective on why this was done, and avoid to fall back in the same pitfalls):

Two important questions for me remain:

  1. How many people are using SQLAlchemy in production? I've been typically using Django, so it would be good to see how many production DBs use SQLA to make sure there are no issues we didn't notice (e.g. I know the transaction management is sub-optimal (e.g. changing extras both from the cmdline and the python shell concurrently might lead to data loss), even if this is probably true in both backends)
  2. SQLAlchemy released 1.4 that already allows to use the future 2.0 syntax. Since there are parts of the code that we need to change (see also #4813), I would suggest to first migrate the codebase to support SQLA 1.4 (with 2.0 syntax). This is something we need to do anyway, and (probably?) the amount of work is the same with 1 or 2 backends). Or, in any case, I would suggest to migrate the code in a branch, where at least the SQLA backend works with SQLA 1.4. I would take the decision after this, just to avoid to commit to a solution that might give us headaches afterwards (and we had many headaches with SQLAlchemy in the past because of its peculiar design).

In summary:

chrisjsewell commented 3 years ago

@giovannipizzi I am bit disappointed that you have chosen to specifically ignore my request; to get an initial consensus from the group on the issue, before diving in to technical discussion, but I will answer some of your points:

I'm not sure that's the correct way to discuss issues - you give yourself the space to argue for yourself

I disagree. I did not argue, I made a statement to gauge initial support (and made that clear). It is, in my experience, fruitless to go into these long, drawn out discussions, if people already have entrenched viewpoints.

we needed to support all features (JSONB in SQLite, flexible queries, ...), so both backends were needed.. but now this reason is gone

This is exactly what I already mentioned in the initial comment. I completely understand there were historical reasons, but now it is just hindering aiida rather than helping it.

one IMPORTANT note: removing the backend concept

I never said to remove the backend concept

accepting that SQLAlchemy code will most probably eventually "leak" into the whole codebase

and so this is a false equivalence

we have spent a significant effort modularising the code

Except the code is not well modularised, which is why we have to maintain two entirely separate implementations for database migrations and archive exports (for what is now exactly the same database), amongst others things, which is a large impediment for anyone trying to make any notable changes/improvements to aiida If we really want to make the backend modular, then it should be a plugin.

Thanks to our current isolation of the backend in a module, all DB-specific code is in specific place

As well, as aiida/backends, there is aiida/orm/implementations and aiida/tools/importexport/dbimport/backends

do we see in the foreseeable future that we extend AiiDA to more DBs/DB backends?

This is premature generalisation. If you want any of the features you mention, then I would suggest you should be writing them in an AEP and adding them to a roadmap. Besides, a key point of both sqlalchemy and django are that they are already abstraction layers for interfacing with multiple database APIs.

Just to bring a specific example: a few years ago, most command line commands were directly accessing the DB via Django or SQLAlchemy queries, and it took quite some time to convert them all to QueryBuilder.

Again, I am not asking for the backend concept to be removed, but we have now gone to the opposite extreme of having multiple layers and implementations of abstraction

the point of backend abstraction should be to allow for plugging in multiple types of database, not having two semi-hardcoded implementations of accessing exactly the same database, with exactly the same schema

chrisjsewell commented 3 years ago

Cross-posting a comment here from https://github.com/aiidateam/aiida-restapi/issues/26#issuecomment-861421127, which I think is relevant to the backend abstraction/modularization discussion

TBH its made it a bit "confusing", that you have the AiiDA ORM classes, which are not technically the same as the backend sqlalchemy/django ORM classes, but that with the QueryBuilder you specify / get back the fields of the "backend" ORM classes? (the classic being this annoying translation of id -> pk, and also name -> label on computers)

basically, IMO, the QueryBuilder feels a little "off" at present, because it seems to expose aspects of the backend ORM, rather than the frontend ORM

chrisjsewell commented 3 years ago

Another possible thing to bear in mind when considering databases as a plugin, is how much should they be tied to the object store? As well as the node repository file structure now being stored in the DB, from https://github.com/aiidateam/aiida-core/commit/87ab7e3a47fe64d642d88b9f1e31e86714be6a2c, the repository is now directly initialised within the database migrations. So presumably a new DB backend would need at least one "faux" migration to initialise the repository.

chrisjsewell commented 3 years ago

Linking https://github.com/aiidateam/aiida-core/issues/2303

chrisjsewell commented 3 years ago

From a meeting with the team, the approach may be:

...

chrisjsewell commented 3 years ago

Comparison of schemas, using:

import tabulate
from aiida import load_profile

load_profile()

from aiida.manage.manager import get_manager

backend = get_manager().get_backend()

# https://www.postgresql.org/docs/12/infoschema-tables.html
tables_columns = ("table_type", "is_typed")
table_rows = backend.execute_raw(
    f"SELECT table_name,{','.join(tables_columns)} FROM information_schema.tables WHERE table_schema = 'public'"
)

print(
    tabulate.tabulate(sorted([list(r) for r in table_rows]),
                      headers=('table_name', *tables_columns)))

table_dict = {row[0]: dict(zip(tables_columns, row[1:])) for row in table_rows}

# https://www.postgresql.org/docs/12/infoschema-columns.html
field_columns = ("data_type", "is_nullable", "is_identity",
                 "character_maximum_length", "column_default")
convert = {"character_maximum_length": "char_max_len"}
for table_name in sorted(table_dict):
    header = f"\nTable: {table_name}"
    print(f"{header}\n{'='*len(header)}")
    field_rows = backend.execute_raw(
        f"SELECT column_name,{','.join(field_columns)} FROM information_schema.columns WHERE table_schema = 'public' AND table_name = '{table_name}'"
    )
    print(
        tabulate.tabulate(sorted([list(r) for r in field_rows]),
                          headers=('column_name', *(convert.get(f, f)
                                                    for f in field_columns))))

Django:

table_name              table_type    is_typed
----------------------  ------------  ----------
auth_group              BASE TABLE    NO
auth_group_permissions  BASE TABLE    NO
auth_permission         BASE TABLE    NO
db_dbauthinfo           BASE TABLE    NO
db_dbcomment            BASE TABLE    NO
db_dbcomputer           BASE TABLE    NO
db_dbgroup              BASE TABLE    NO
db_dbgroup_dbnodes      BASE TABLE    NO
db_dblink               BASE TABLE    NO
db_dblog                BASE TABLE    NO
db_dbnode               BASE TABLE    NO
db_dbsetting            BASE TABLE    NO
db_dbuser               BASE TABLE    NO
django_content_type     BASE TABLE    NO
django_migrations       BASE TABLE    NO

Table: auth_group
==================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  --------------------------------------
id             integer            NO             NO                             nextval('auth_group_id_seq'::regclass)
name           character varying  NO             NO                        150

Table: auth_group_permissions
==============================
column_name    data_type    is_nullable    is_identity    char_max_len    column_default
-------------  -----------  -------------  -------------  --------------  --------------------------------------------------
group_id       integer      NO             NO
id             integer      NO             NO                             nextval('auth_group_permissions_id_seq'::regclass)
permission_id  integer      NO             NO

Table: auth_permission
=======================
column_name      data_type          is_nullable    is_identity      char_max_len  column_default
---------------  -----------------  -------------  -------------  --------------  -------------------------------------------
codename         character varying  NO             NO                        100
content_type_id  integer            NO             NO
id               integer            NO             NO                             nextval('auth_permission_id_seq'::regclass)
name             character varying  NO             NO                        255

Table: db_dbauthinfo
=====================
column_name    data_type    is_nullable    is_identity    char_max_len    column_default
-------------  -----------  -------------  -------------  --------------  -----------------------------------------
aiidauser_id   integer      NO             NO
auth_params    jsonb        NO             NO
dbcomputer_id  integer      NO             NO
enabled        boolean      NO             NO
id             integer      NO             NO                             nextval('db_dbauthinfo_id_seq'::regclass)
metadata       jsonb        NO             NO

Table: db_dbcomment
====================
column_name    data_type                 is_nullable    is_identity    char_max_len    column_default
-------------  ------------------------  -------------  -------------  --------------  ----------------------------------------
content        text                      NO             NO
ctime          timestamp with time zone  NO             NO
dbnode_id      integer                   NO             NO
id             integer                   NO             NO                             nextval('db_dbcomment_id_seq'::regclass)
mtime          timestamp with time zone  NO             NO
user_id        integer                   NO             NO
uuid           uuid                      NO             NO

Table: db_dbcomputer
=====================
column_name     data_type          is_nullable    is_identity      char_max_len  column_default
--------------  -----------------  -------------  -------------  --------------  -----------------------------------------
description     text               NO             NO
hostname        character varying  NO             NO                        255
id              integer            NO             NO                             nextval('db_dbcomputer_id_seq'::regclass)
label           character varying  NO             NO                        255
metadata        jsonb              NO             NO
scheduler_type  character varying  NO             NO                        255
transport_type  character varying  NO             NO                        255
uuid            uuid               NO             NO

Table: db_dbgroup
==================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  --------------------------------------
description    text                      NO             NO
extras         jsonb                     NO             NO
id             integer                   NO             NO                             nextval('db_dbgroup_id_seq'::regclass)
label          character varying         NO             NO                        255
time           timestamp with time zone  NO             NO
type_string    character varying         NO             NO                        255
user_id        integer                   NO             NO
uuid           uuid                      NO             NO

Table: db_dbgroup_dbnodes
==========================
column_name    data_type    is_nullable    is_identity    char_max_len    column_default
-------------  -----------  -------------  -------------  --------------  ----------------------------------------------
dbgroup_id     integer      NO             NO
dbnode_id      integer      NO             NO
id             integer      NO             NO                             nextval('db_dbgroup_dbnodes_id_seq'::regclass)

Table: db_dblink
=================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  -------------------------------------
id             integer            NO             NO                             nextval('db_dblink_id_seq'::regclass)
input_id       integer            NO             NO
label          character varying  NO             NO                        255
output_id      integer            NO             NO
type           character varying  NO             NO                        255

Table: db_dblog
================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  ------------------------------------
dbnode_id      integer                   NO             NO
id             integer                   NO             NO                             nextval('db_dblog_id_seq'::regclass)
levelname      character varying         NO             NO                         50
loggername     character varying         NO             NO                        255
message        text                      NO             NO
metadata       jsonb                     NO             NO
time           timestamp with time zone  NO             NO
uuid           uuid                      NO             NO

Table: db_dbnode
=================
column_name          data_type                 is_nullable    is_identity      char_max_len  column_default
-------------------  ------------------------  -------------  -------------  --------------  -------------------------------------
attributes           jsonb                     YES            NO
ctime                timestamp with time zone  NO             NO
dbcomputer_id        integer                   YES            NO
description          text                      NO             NO
extras               jsonb                     YES            NO
id                   integer                   NO             NO                             nextval('db_dbnode_id_seq'::regclass)
label                character varying         NO             NO                        255
mtime                timestamp with time zone  NO             NO
node_type            character varying         NO             NO                        255
process_type         character varying         YES            NO                        255
repository_metadata  jsonb                     YES            NO
user_id              integer                   NO             NO
uuid                 uuid                      NO             NO

Table: db_dbsetting
====================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  ----------------------------------------
description    text                      NO             NO
id             integer                   NO             NO                             nextval('db_dbsetting_id_seq'::regclass)
key            character varying         NO             NO                       1024
time           timestamp with time zone  NO             NO
val            jsonb                     YES            NO

Table: db_dbuser
=================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  -------------------------------------
email          character varying  NO             NO                        254
first_name     character varying  NO             NO                        254
id             integer            NO             NO                             nextval('db_dbuser_id_seq'::regclass)
institution    character varying  NO             NO                        254
last_name      character varying  NO             NO                        254

Table: django_content_type
===========================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  -----------------------------------------------
app_label      character varying  NO             NO                        100
id             integer            NO             NO                             nextval('django_content_type_id_seq'::regclass)
model          character varying  NO             NO                        100

Table: django_migrations
=========================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  ---------------------------------------------
app            character varying         NO             NO                        255
applied        timestamp with time zone  NO             NO
id             integer                   NO             NO                             nextval('django_migrations_id_seq'::regclass)
name           character varying         NO             NO                        255

SQLAlchemy:

table_name          table_type    is_typed
------------------  ------------  ----------
alembic_version     BASE TABLE    NO
db_dbauthinfo       BASE TABLE    NO
db_dbcomment        BASE TABLE    NO
db_dbcomputer       BASE TABLE    NO
db_dbgroup          BASE TABLE    NO
db_dbgroup_dbnodes  BASE TABLE    NO
db_dblink           BASE TABLE    NO
db_dblog            BASE TABLE    NO
db_dbnode           BASE TABLE    NO
db_dbsetting        BASE TABLE    NO
db_dbuser           BASE TABLE    NO

Table: alembic_version
=======================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  ----------------
version_num    character varying  NO             NO                         32

Table: db_dbauthinfo
=====================
column_name    data_type    is_nullable    is_identity    char_max_len    column_default
-------------  -----------  -------------  -------------  --------------  -----------------------------------------
aiidauser_id   integer      YES            NO
auth_params    jsonb        YES            NO
dbcomputer_id  integer      YES            NO
enabled        boolean      YES            NO
id             integer      NO             NO                             nextval('db_dbauthinfo_id_seq'::regclass)
metadata       jsonb        YES            NO

Table: db_dbcomment
====================
column_name    data_type                 is_nullable    is_identity    char_max_len    column_default
-------------  ------------------------  -------------  -------------  --------------  ----------------------------------------
content        text                      YES            NO
ctime          timestamp with time zone  YES            NO
dbnode_id      integer                   YES            NO
id             integer                   NO             NO                             nextval('db_dbcomment_id_seq'::regclass)
mtime          timestamp with time zone  YES            NO
user_id        integer                   YES            NO
uuid           uuid                      YES            NO

Table: db_dbcomputer
=====================
column_name     data_type          is_nullable    is_identity      char_max_len  column_default
--------------  -----------------  -------------  -------------  --------------  -----------------------------------------
description     text               YES            NO
hostname        character varying  YES            NO                        255
id              integer            NO             NO                             nextval('db_dbcomputer_id_seq'::regclass)
label           character varying  NO             NO                        255
metadata        jsonb              YES            NO
scheduler_type  character varying  YES            NO                        255
transport_type  character varying  YES            NO                        255
uuid            uuid               YES            NO

Table: db_dbgroup
==================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  --------------------------------------
description    text                      YES            NO
extras         jsonb                     NO             NO
id             integer                   NO             NO                             nextval('db_dbgroup_id_seq'::regclass)
label          character varying         YES            NO                        255
time           timestamp with time zone  YES            NO
type_string    character varying         YES            NO                        255
user_id        integer                   YES            NO
uuid           uuid                      YES            NO

Table: db_dbgroup_dbnodes
==========================
column_name    data_type    is_nullable    is_identity    char_max_len    column_default
-------------  -----------  -------------  -------------  --------------  ----------------------------------------------
dbgroup_id     integer      YES            NO
dbnode_id      integer      YES            NO
id             integer      NO             NO                             nextval('db_dbgroup_dbnodes_id_seq'::regclass)

Table: db_dblink
=================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  -------------------------------------
id             integer            NO             NO                             nextval('db_dblink_id_seq'::regclass)
input_id       integer            YES            NO
label          character varying  NO             NO                        255
output_id      integer            YES            NO
type           character varying  YES            NO                        255

Table: db_dblog
================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  ------------------------------------
dbnode_id      integer                   NO             NO
id             integer                   NO             NO                             nextval('db_dblog_id_seq'::regclass)
levelname      character varying         YES            NO                        255
loggername     character varying         YES            NO                        255
message        text                      YES            NO
metadata       jsonb                     YES            NO
time           timestamp with time zone  YES            NO
uuid           uuid                      YES            NO

Table: db_dbnode
=================
column_name          data_type                 is_nullable    is_identity      char_max_len  column_default
-------------------  ------------------------  -------------  -------------  --------------  -------------------------------------
attributes           jsonb                     YES            NO
ctime                timestamp with time zone  YES            NO
dbcomputer_id        integer                   YES            NO
description          text                      YES            NO
extras               jsonb                     YES            NO
id                   integer                   NO             NO                             nextval('db_dbnode_id_seq'::regclass)
label                character varying         YES            NO                        255
mtime                timestamp with time zone  YES            NO
node_type            character varying         YES            NO                        255
process_type         character varying         YES            NO                        255
repository_metadata  jsonb                     NO             NO                             '{}'::jsonb
user_id              integer                   NO             NO
uuid                 uuid                      YES            NO

Table: db_dbsetting
====================
column_name    data_type                 is_nullable    is_identity      char_max_len  column_default
-------------  ------------------------  -------------  -------------  --------------  ----------------------------------------
description    character varying         NO             NO                        255
id             integer                   NO             NO                             nextval('db_dbsetting_id_seq'::regclass)
key            character varying         NO             NO                        255
time           timestamp with time zone  YES            NO
val            jsonb                     YES            NO

Table: db_dbuser
=================
column_name    data_type          is_nullable    is_identity      char_max_len  column_default
-------------  -----------------  -------------  -------------  --------------  -------------------------------------
email          character varying  YES            NO                        254
first_name     character varying  YES            NO                        254
id             integer            NO             NO                             nextval('db_dbuser_id_seq'::regclass)
institution    character varying  YES            NO                        254
last_name      character varying  YES            NO                        254
chrisjsewell commented 3 years ago

Principally all the db_* tables have identical column_names and data types (except db_dbsetting.description being "incorrectly" char in sqlalchemy), then there are a few differences in char_max_len (db_dbsetting.key, db_dblog.levelname) then the main difference is that different fields are nullable.

chrisjsewell commented 3 years ago

If converting from django to sqlalchemy, I assume these tables can just be silently ignore/removed:

$ verdi devel run-sql "SELECT * FROM auth_group;"   
$ verdi devel run-sql "SELECT * FROM auth_group_permissions;"
$ verdi devel run-sql "SELECT * FROM auth_permission;"       
(1, 'Can add permission', 1, 'add_permission')
(2, 'Can change permission', 1, 'change_permission')
(3, 'Can delete permission', 1, 'delete_permission')
(4, 'Can view permission', 1, 'view_permission')
(5, 'Can add group', 2, 'add_group')
(6, 'Can change group', 2, 'change_group')
(7, 'Can delete group', 2, 'delete_group')
(8, 'Can view group', 2, 'view_group')
(9, 'Can add content type', 3, 'add_contenttype')
(10, 'Can change content type', 3, 'change_contenttype')
(11, 'Can delete content type', 3, 'delete_contenttype')
(12, 'Can view content type', 3, 'view_contenttype')
(13, 'Can add db user', 4, 'add_dbuser')
(14, 'Can change db user', 4, 'change_dbuser')
(15, 'Can delete db user', 4, 'delete_dbuser')
(16, 'Can view db user', 4, 'view_dbuser')
(17, 'Can add db auth info', 5, 'add_dbauthinfo')
(18, 'Can change db auth info', 5, 'change_dbauthinfo')
(19, 'Can delete db auth info', 5, 'delete_dbauthinfo')
(20, 'Can view db auth info', 5, 'view_dbauthinfo')
(21, 'Can add db comment', 6, 'add_dbcomment')
(22, 'Can change db comment', 6, 'change_dbcomment')
(23, 'Can delete db comment', 6, 'delete_dbcomment')
(24, 'Can view db comment', 6, 'view_dbcomment')
(25, 'Can add db computer', 7, 'add_dbcomputer')
(26, 'Can change db computer', 7, 'change_dbcomputer')
(27, 'Can delete db computer', 7, 'delete_dbcomputer')
(28, 'Can view db computer', 7, 'view_dbcomputer')
(29, 'Can add db group', 8, 'add_dbgroup')
(30, 'Can change db group', 8, 'change_dbgroup')
(31, 'Can delete db group', 8, 'delete_dbgroup')
(32, 'Can view db group', 8, 'view_dbgroup')
(33, 'Can add db link', 9, 'add_dblink')
(34, 'Can change db link', 9, 'change_dblink')
(35, 'Can delete db link', 9, 'delete_dblink')
(36, 'Can view db link', 9, 'view_dblink')
(37, 'Can add db log', 10, 'add_dblog')
(38, 'Can change db log', 10, 'change_dblog')
(39, 'Can delete db log', 10, 'delete_dblog')
(40, 'Can view db log', 10, 'view_dblog')
(41, 'Can add db node', 11, 'add_dbnode')
(42, 'Can change db node', 11, 'change_dbnode')
(43, 'Can delete db node', 11, 'delete_dbnode')
(44, 'Can view db node', 11, 'view_dbnode')
(45, 'Can add db setting', 12, 'add_dbsetting')
(46, 'Can change db setting', 12, 'change_dbsetting')
(47, 'Can delete db setting', 12, 'delete_dbsetting')
(48, 'Can view db setting', 12, 'view_dbsetting')

and you could just directly write the alembic_version table:

$ verdi -p new_sqla devel run-sql "SELECT * FROM alembic_version;"
('535039300e4a',)
chrisjsewell commented 3 years ago

Comparison of indexes:

Django:

$ verdi -p new_django devel run-sql "SELECT tablename,indexname,indexdef FROM pg_indexes WHERE schemaname = 'public' ORDER BY tablename,indexname;"
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_0684fdfb', 'CREATE INDEX db_dbauthinfo_aiidauser_id_0684fdfb ON public.db_dbauthinfo USING btree (aiidauser_id)')
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_dbcomputer_id_777cdaa8_uniq', 'CREATE UNIQUE INDEX db_dbauthinfo_aiidauser_id_dbcomputer_id_777cdaa8_uniq ON public.db_dbauthinfo USING btree (aiidauser_id, dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_dbcomputer_id_424f7ac4', 'CREATE INDEX db_dbauthinfo_dbcomputer_id_424f7ac4 ON public.db_dbauthinfo USING btree (dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_pkey', 'CREATE UNIQUE INDEX db_dbauthinfo_pkey ON public.db_dbauthinfo USING btree (id)')
('db_dbcomment', 'db_dbcomment_dbnode_id_3b812b6b', 'CREATE INDEX db_dbcomment_dbnode_id_3b812b6b ON public.db_dbcomment USING btree (dbnode_id)')
('db_dbcomment', 'db_dbcomment_pkey', 'CREATE UNIQUE INDEX db_dbcomment_pkey ON public.db_dbcomment USING btree (id)')
('db_dbcomment', 'db_dbcomment_user_id_8ed5e360', 'CREATE INDEX db_dbcomment_user_id_8ed5e360 ON public.db_dbcomment USING btree (user_id)')
('db_dbcomment', 'db_dbcomment_uuid_49bac08c_uniq', 'CREATE UNIQUE INDEX db_dbcomment_uuid_49bac08c_uniq ON public.db_dbcomment USING btree (uuid)')
('db_dbcomputer', 'db_dbcomputer_name_f1800b1a_like', 'CREATE INDEX db_dbcomputer_name_f1800b1a_like ON public.db_dbcomputer USING btree (label varchar_pattern_ops)')
('db_dbcomputer', 'db_dbcomputer_name_key', 'CREATE UNIQUE INDEX db_dbcomputer_name_key ON public.db_dbcomputer USING btree (label)')
('db_dbcomputer', 'db_dbcomputer_pkey', 'CREATE UNIQUE INDEX db_dbcomputer_pkey ON public.db_dbcomputer USING btree (id)')
('db_dbcomputer', 'db_dbcomputer_uuid_f35defa6_uniq', 'CREATE UNIQUE INDEX db_dbcomputer_uuid_f35defa6_uniq ON public.db_dbcomputer USING btree (uuid)')
('db_dbgroup', 'db_dbgroup_name_66c75272', 'CREATE INDEX db_dbgroup_name_66c75272 ON public.db_dbgroup USING btree (label)')
('db_dbgroup', 'db_dbgroup_name_66c75272_like', 'CREATE INDEX db_dbgroup_name_66c75272_like ON public.db_dbgroup USING btree (label varchar_pattern_ops)')
('db_dbgroup', 'db_dbgroup_name_type_12656f33_uniq', 'CREATE UNIQUE INDEX db_dbgroup_name_type_12656f33_uniq ON public.db_dbgroup USING btree (label, type_string)')
('db_dbgroup', 'db_dbgroup_pkey', 'CREATE UNIQUE INDEX db_dbgroup_pkey ON public.db_dbgroup USING btree (id)')
('db_dbgroup', 'db_dbgroup_type_23b2a748', 'CREATE INDEX db_dbgroup_type_23b2a748 ON public.db_dbgroup USING btree (type_string)')
('db_dbgroup', 'db_dbgroup_type_23b2a748_like', 'CREATE INDEX db_dbgroup_type_23b2a748_like ON public.db_dbgroup USING btree (type_string varchar_pattern_ops)')
('db_dbgroup', 'db_dbgroup_user_id_100f8a51', 'CREATE INDEX db_dbgroup_user_id_100f8a51 ON public.db_dbgroup USING btree (user_id)')
('db_dbgroup', 'db_dbgroup_uuid_af896177_uniq', 'CREATE UNIQUE INDEX db_dbgroup_uuid_af896177_uniq ON public.db_dbgroup USING btree (uuid)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_9d3a0f9d', 'CREATE INDEX db_dbgroup_dbnodes_dbgroup_id_9d3a0f9d ON public.db_dbgroup_dbnodes USING btree (dbgroup_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_dbnode_id_eee23cce_uniq', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_dbgroup_id_dbnode_id_eee23cce_uniq ON public.db_dbgroup_dbnodes USING btree (dbgroup_id, dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbnode_id_118b9439', 'CREATE INDEX db_dbgroup_dbnodes_dbnode_id_118b9439 ON public.db_dbgroup_dbnodes USING btree (dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_pkey', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_pkey ON public.db_dbgroup_dbnodes USING btree (id)')
('db_dblink', 'db_dblink_input_id_9245bd73', 'CREATE INDEX db_dblink_input_id_9245bd73 ON public.db_dblink USING btree (input_id)')
('db_dblink', 'db_dblink_label_f1343cfb', 'CREATE INDEX db_dblink_label_f1343cfb ON public.db_dblink USING btree (label)')
('db_dblink', 'db_dblink_label_f1343cfb_like', 'CREATE INDEX db_dblink_label_f1343cfb_like ON public.db_dblink USING btree (label varchar_pattern_ops)')
('db_dblink', 'db_dblink_output_id_c0167528', 'CREATE INDEX db_dblink_output_id_c0167528 ON public.db_dblink USING btree (output_id)')
('db_dblink', 'db_dblink_pkey', 'CREATE UNIQUE INDEX db_dblink_pkey ON public.db_dblink USING btree (id)')
('db_dblink', 'db_dblink_type_229f212b', 'CREATE INDEX db_dblink_type_229f212b ON public.db_dblink USING btree (type)')
('db_dblink', 'db_dblink_type_229f212b_like', 'CREATE INDEX db_dblink_type_229f212b_like ON public.db_dblink USING btree (type varchar_pattern_ops)')
('db_dblog', 'db_dblog_dbnode_id_da34b732', 'CREATE INDEX db_dblog_dbnode_id_da34b732 ON public.db_dblog USING btree (dbnode_id)')
('db_dblog', 'db_dblog_levelname_ad5dc346', 'CREATE INDEX db_dblog_levelname_ad5dc346 ON public.db_dblog USING btree (levelname)')
('db_dblog', 'db_dblog_levelname_ad5dc346_like', 'CREATE INDEX db_dblog_levelname_ad5dc346_like ON public.db_dblog USING btree (levelname varchar_pattern_ops)')
('db_dblog', 'db_dblog_loggername_00b5ba16', 'CREATE INDEX db_dblog_loggername_00b5ba16 ON public.db_dblog USING btree (loggername)')
('db_dblog', 'db_dblog_loggername_00b5ba16_like', 'CREATE INDEX db_dblog_loggername_00b5ba16_like ON public.db_dblog USING btree (loggername varchar_pattern_ops)')
('db_dblog', 'db_dblog_pkey', 'CREATE UNIQUE INDEX db_dblog_pkey ON public.db_dblog USING btree (id)')
('db_dblog', 'db_dblog_uuid_9cf77df3_uniq', 'CREATE UNIQUE INDEX db_dblog_uuid_9cf77df3_uniq ON public.db_dblog USING btree (uuid)')
('db_dbnode', 'db_dbnode_ctime_71626ef5', 'CREATE INDEX db_dbnode_ctime_71626ef5 ON public.db_dbnode USING btree (ctime)')
('db_dbnode', 'db_dbnode_dbcomputer_id_315372a3', 'CREATE INDEX db_dbnode_dbcomputer_id_315372a3 ON public.db_dbnode USING btree (dbcomputer_id)')
('db_dbnode', 'db_dbnode_label_6469539e', 'CREATE INDEX db_dbnode_label_6469539e ON public.db_dbnode USING btree (label)')
('db_dbnode', 'db_dbnode_label_6469539e_like', 'CREATE INDEX db_dbnode_label_6469539e_like ON public.db_dbnode USING btree (label varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_mtime_0554ea3d', 'CREATE INDEX db_dbnode_mtime_0554ea3d ON public.db_dbnode USING btree (mtime)')
('db_dbnode', 'db_dbnode_pkey', 'CREATE UNIQUE INDEX db_dbnode_pkey ON public.db_dbnode USING btree (id)')
('db_dbnode', 'db_dbnode_process_type_df7298d0', 'CREATE INDEX db_dbnode_process_type_df7298d0 ON public.db_dbnode USING btree (process_type)')
('db_dbnode', 'db_dbnode_process_type_df7298d0_like', 'CREATE INDEX db_dbnode_process_type_df7298d0_like ON public.db_dbnode USING btree (process_type varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_type_a8ce9753', 'CREATE INDEX db_dbnode_type_a8ce9753 ON public.db_dbnode USING btree (node_type)')
('db_dbnode', 'db_dbnode_type_a8ce9753_like', 'CREATE INDEX db_dbnode_type_a8ce9753_like ON public.db_dbnode USING btree (node_type varchar_pattern_ops)')
('db_dbnode', 'db_dbnode_user_id_12e7aeaf', 'CREATE INDEX db_dbnode_user_id_12e7aeaf ON public.db_dbnode USING btree (user_id)')
('db_dbnode', 'db_dbnode_uuid_62e0bf98_uniq', 'CREATE UNIQUE INDEX db_dbnode_uuid_62e0bf98_uniq ON public.db_dbnode USING btree (uuid)')
('db_dbsetting', 'db_dbsetting_key_1b84beb4_like', 'CREATE INDEX db_dbsetting_key_1b84beb4_like ON public.db_dbsetting USING btree (key varchar_pattern_ops)')
('db_dbsetting', 'db_dbsetting_key_1b84beb4_uniq', 'CREATE UNIQUE INDEX db_dbsetting_key_1b84beb4_uniq ON public.db_dbsetting USING btree (key)')
('db_dbsetting', 'db_dbsetting_pkey', 'CREATE UNIQUE INDEX db_dbsetting_pkey ON public.db_dbsetting USING btree (id)')
('db_dbuser', 'db_dbuser_email_30150b7e_like', 'CREATE INDEX db_dbuser_email_30150b7e_like ON public.db_dbuser USING btree (email varchar_pattern_ops)')
('db_dbuser', 'db_dbuser_email_30150b7e_uniq', 'CREATE UNIQUE INDEX db_dbuser_email_30150b7e_uniq ON public.db_dbuser USING btree (email)')
('db_dbuser', 'db_dbuser_pkey', 'CREATE UNIQUE INDEX db_dbuser_pkey ON public.db_dbuser USING btree (id)')

SQLAlchemy:

$ verdi -p new_sqla devel run-sql "SELECT tablename,indexname,indexdef FROM pg_indexes WHERE schemaname = 'public' ORDER BY tablename,indexname;"
('db_dbauthinfo', 'db_dbauthinfo_aiidauser_id_dbcomputer_id_key', 'CREATE UNIQUE INDEX db_dbauthinfo_aiidauser_id_dbcomputer_id_key ON public.db_dbauthinfo USING btree (aiidauser_id, dbcomputer_id)')
('db_dbauthinfo', 'db_dbauthinfo_pkey', 'CREATE UNIQUE INDEX db_dbauthinfo_pkey ON public.db_dbauthinfo USING btree (id)')
('db_dbcomment', 'db_dbcomment_pkey', 'CREATE UNIQUE INDEX db_dbcomment_pkey ON public.db_dbcomment USING btree (id)')
('db_dbcomment', 'db_dbcomment_uuid_key', 'CREATE UNIQUE INDEX db_dbcomment_uuid_key ON public.db_dbcomment USING btree (uuid)')
('db_dbcomputer', 'db_dbcomputer_label_key', 'CREATE UNIQUE INDEX db_dbcomputer_label_key ON public.db_dbcomputer USING btree (label)')
('db_dbcomputer', 'db_dbcomputer_pkey', 'CREATE UNIQUE INDEX db_dbcomputer_pkey ON public.db_dbcomputer USING btree (id)')
('db_dbcomputer', 'db_dbcomputer_uuid_key', 'CREATE UNIQUE INDEX db_dbcomputer_uuid_key ON public.db_dbcomputer USING btree (uuid)')
('db_dbgroup', 'db_dbgroup_label_type_string_key', 'CREATE UNIQUE INDEX db_dbgroup_label_type_string_key ON public.db_dbgroup USING btree (label, type_string)')
('db_dbgroup', 'db_dbgroup_pkey', 'CREATE UNIQUE INDEX db_dbgroup_pkey ON public.db_dbgroup USING btree (id)')
('db_dbgroup', 'db_dbgroup_uuid_key', 'CREATE UNIQUE INDEX db_dbgroup_uuid_key ON public.db_dbgroup USING btree (uuid)')
('db_dbgroup', 'ix_db_dbgroup_label', 'CREATE INDEX ix_db_dbgroup_label ON public.db_dbgroup USING btree (label)')
('db_dbgroup', 'ix_db_dbgroup_type_string', 'CREATE INDEX ix_db_dbgroup_type_string ON public.db_dbgroup USING btree (type_string)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key ON public.db_dbgroup_dbnodes USING btree (dbgroup_id, dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbgroup_id_idx', 'CREATE INDEX db_dbgroup_dbnodes_dbgroup_id_idx ON public.db_dbgroup_dbnodes USING btree (dbgroup_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_dbnode_id_idx', 'CREATE INDEX db_dbgroup_dbnodes_dbnode_id_idx ON public.db_dbgroup_dbnodes USING btree (dbnode_id)')
('db_dbgroup_dbnodes', 'db_dbgroup_dbnodes_pkey', 'CREATE UNIQUE INDEX db_dbgroup_dbnodes_pkey ON public.db_dbgroup_dbnodes USING btree (id)')
('db_dblink', 'db_dblink_pkey', 'CREATE UNIQUE INDEX db_dblink_pkey ON public.db_dblink USING btree (id)')
('db_dblink', 'ix_db_dblink_input_id', 'CREATE INDEX ix_db_dblink_input_id ON public.db_dblink USING btree (input_id)')
('db_dblink', 'ix_db_dblink_label', 'CREATE INDEX ix_db_dblink_label ON public.db_dblink USING btree (label)')
('db_dblink', 'ix_db_dblink_output_id', 'CREATE INDEX ix_db_dblink_output_id ON public.db_dblink USING btree (output_id)')
('db_dblink', 'ix_db_dblink_type', 'CREATE INDEX ix_db_dblink_type ON public.db_dblink USING btree (type)')
('db_dblog', 'db_dblog_pkey', 'CREATE UNIQUE INDEX db_dblog_pkey ON public.db_dblog USING btree (id)')
('db_dblog', 'db_dblog_uuid_key', 'CREATE UNIQUE INDEX db_dblog_uuid_key ON public.db_dblog USING btree (uuid)')
('db_dblog', 'ix_db_dblog_levelname', 'CREATE INDEX ix_db_dblog_levelname ON public.db_dblog USING btree (levelname)')
('db_dblog', 'ix_db_dblog_loggername', 'CREATE INDEX ix_db_dblog_loggername ON public.db_dblog USING btree (loggername)')
('db_dbnode', 'db_dbnode_pkey', 'CREATE UNIQUE INDEX db_dbnode_pkey ON public.db_dbnode USING btree (id)')
('db_dbnode', 'db_dbnode_uuid_key', 'CREATE UNIQUE INDEX db_dbnode_uuid_key ON public.db_dbnode USING btree (uuid)')
('db_dbnode', 'ix_db_dbnode_label', 'CREATE INDEX ix_db_dbnode_label ON public.db_dbnode USING btree (label)')
('db_dbnode', 'ix_db_dbnode_node_type', 'CREATE INDEX ix_db_dbnode_node_type ON public.db_dbnode USING btree (node_type)')
('db_dbnode', 'ix_db_dbnode_process_type', 'CREATE INDEX ix_db_dbnode_process_type ON public.db_dbnode USING btree (process_type)')
('db_dbsetting', 'db_dbsetting_key_key', 'CREATE UNIQUE INDEX db_dbsetting_key_key ON public.db_dbsetting USING btree (key)')
('db_dbsetting', 'db_dbsetting_pkey', 'CREATE UNIQUE INDEX db_dbsetting_pkey ON public.db_dbsetting USING btree (id)')
('db_dbsetting', 'ix_db_dbsetting_key', 'CREATE INDEX ix_db_dbsetting_key ON public.db_dbsetting USING btree (key)')
('db_dbuser', 'db_dbuser_pkey', 'CREATE UNIQUE INDEX db_dbuser_pkey ON public.db_dbuser USING btree (id)')
('db_dbuser', 'ix_db_dbuser_email', 'CREATE UNIQUE INDEX ix_db_dbuser_email ON public.db_dbuser USING btree (email)')

So django creates an additional varchar_pattern_ops index for every indexed CharField (see https://www.postgresql.org/docs/9.4/indexes-opclass.html). Don't know if this is desirable for evey one of these, maybe just fields where you would actually do a lot of like queries for, e.g. node_type, process_type, type_string

chrisjsewell commented 3 years ago

Ah I see @giovannipizzi already opened an issue about this #2303

Adding these with alembic is noted here: https://stackoverflow.com/a/52689029/5033292

giovannipizzi commented 3 years ago

Thanks Chris for looking into this. Indeed, the auth_* tables are automatically generated by Django but we don't use them (there might also be a few more, maybe?).

For the indices, indeed it would be good to uniform the two as mentioned in #2303, but also with an eye to which ones are really relevant to avoid slowdowns and used disk space with the indices never really used - see e.g. this discussion #2762 that also reports how the indices are used (those queries, however, need to be run on big production DBs, e.g. the Materials Cloud ones or the ones of @sphuber and @mbercx - in small DBs, PSQL might always decide not to use the index as a scan is always faster). For this issue I would just focus on the differences between the two schemas, and we can defer a more detailed analysis of common indexes to #2762 and a decision to drop some of them requires benchmarking

chrisjsewell commented 3 years ago

yep cheers @giovannipizzi, just figured out how to add them to the ORM classes and migrations, draft PR opened; #5097

chrisjsewell commented 3 years ago

Indeed, the auth_* tables are automatically generated by Django but we don't use them (there might also be a few more, maybe?).

FYI, these auh_* and django_content_type tables are generated by https://docs.djangoproject.com/en/3.2/ref/contrib/auth/ and https://docs.djangoproject.com/en/3.2/ref/contrib/contenttypes/, set in: https://github.com/aiidateam/aiida-core/blob/4174e5de3adbeec785290a02a0fc78d4597e42e0/aiida/backends/djsite/settings.py#L103-L105

Given these lines haven't been touched for 9 years, and we don't use either, I'm guessing they are both just legacy things

sphuber commented 3 years ago

They are required for typical Django applications that are web applications, but we never used it as such. We only used their ORM. So yeah, we can get rid of this.