aiidateam / aiida-prerequisites

Docker image that contains all prerequisites allowing to run AiiDA.
2 stars 2 forks source link

arm64 build does not work? #62

Open chrisjsewell opened 2 years ago

chrisjsewell commented 2 years ago

According to a discussion I have just had on Slack, it looks like the arm64 distribution does not really work at all at present:

  1. The RabbitMQ version is too high
  2. PostgreSQL is failing for verdi computer configure (so maybe also other things?)

The RabbitMQ version installed is v3.9.13, and the error from postgresql is:

Traceback (most recent call last):
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
  self.engine.dialect.do_commit(self.connection)
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 685, in do_commit
  dbapi_connection.commit()
psycopg2.errors.SyntaxError: syntax error at or near "SHARE"
LINE 1: ...r" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
                               ^
QUERY: SELECT 1 FROM ONLY "public"."db_dbuser" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
 File "/opt/conda/bin/verdi", line 8, in <module>
  sys.exit(verdi())
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
  return self.main(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1055, in main
  rv = self.invoke(ctx)
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
  return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
  return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
  return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
  return ctx.invoke(self.callback, **ctx.params)
 File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 760, in invoke
  return __callback(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/aiida/transports/cli.py", line 164, in transport_configure_command
  configure_computer_main(computer, user, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/aiida/cmdline/utils/decorators.py", line 73, in wrapper
  return wrapped(*args, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/aiida/transports/cli.py", line 47, in configure_computer_main
  computer.configure(user=user, **kwargs)
 File "/opt/conda/lib/python3.8/site-packages/aiida/orm/computers.py", line 682, in configure
  authinfo.store()
 File "/opt/conda/lib/python3.8/site-packages/aiida/orm/entities.py", line 252, in store
  self._backend_entity.store()
 File "/opt/conda/lib/python3.8/site-packages/aiida/storage/psql_dos/orm/entities.py", line 91, in store
  self.model.save()
 File "/opt/conda/lib/python3.8/site-packages/aiida/storage/psql_dos/orm/utils.py", line 122, in save
  self.session.commit()
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
  self._transaction.commit(_to_root=self.future)
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 836, in commit
  trans.commit()
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
  self._do_commit()
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
  self._connection_commit_impl()
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
  self.connection._commit_impl()
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
  self._handle_dbapi_exception(e, None, None, None, None)
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
  util.raise_(
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
  raise exception
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
  self.engine.dialect.do_commit(self.connection)
 File "/opt/conda/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 685, in do_commit
  dbapi_connection.commit()
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "SHARE"
LINE 1: ...r" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
                               ^
QUERY: SELECT 1 FROM ONLY "public"."db_dbuser" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x
(Background on this error at: https://sqlalche.me/e/14/f405)

The postgresql conda environment on arm64 is:

# packages in environment at /opt/conda/envs/pgsql:
#
# Name          Version          Build Channel
_openmp_mutex       4.5            2_gnu  conda-forge
ca-certificates      2022.6.15      h4fd8a4c_0  conda-forge
icu            70.1         ha18d298_0  conda-forge
krb5           1.16.4        h14de66a_0  conda-forge
libedit          3.1.20191231     he28a2e2_2  conda-forge
libgcc-ng         12.1.0       h3242a24_16  conda-forge
libgomp          12.1.0       h3242a24_16  conda-forge
libiconv         1.16         h6dd45c4_0  conda-forge
libpq           10.5         h4e4e079_2  conda-forge
libstdcxx-ng       12.1.0       hd01590b_16  conda-forge
libxml2          2.9.14        h370961a_3  conda-forge
libzlib          1.2.12        h4e544f5_2  conda-forge
ncurses          6.3         headf329_1  conda-forge
openssl          1.1.1q        h4e544f5_0  conda-forge
postgresql        10.5         hbeee2d4_2  conda-forge
readline         7.0        h75b48e3_1001  conda-forge
tk            8.6.12        hd8af866_0  conda-forge
tzcode          2022a        h4e544f5_0  conda-forge
xz            5.2.5        h6dd45c4_1  conda-forge
zlib           1.2.12        h4e544f5_2  conda-forge

as opposed to on amd64:

# packages in environment at /opt/conda/envs/pgsql:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ca-certificates           2022.6.15            ha878542_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
krb5                      1.16.3            hc83ff2d_1000    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libpq                     10.6              h13b8bad_1000    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libxml2                   2.9.14               h22db469_3    conda-forge
libzlib                   1.2.12               h166bdaf_2    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
openssl                   1.0.2u               h516909a_0    conda-forge
postgresql                10.6              h66cca7a_1000    conda-forge
readline                  7.0               hf8c457e_1001    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tzcode                    2022a                h166bdaf_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.12               h166bdaf_2    conda-forge
chrisjsewell commented 2 years ago

cc @csadorf

chrisjsewell commented 2 years ago

perhaps this will be partly fixed with #57 (for rabbitmq), but no idea about the postgresql bug

As an extra question, should we not be upgrading from postgresql v10 at some point, given they will be dropping support in a few months: https://www.postgresql.org/support/versioning/? Is there anything specifically stopping this?

csadorf commented 2 years ago

@chrisjsewell Thanks for reporting. I have not really maintained or used this particular image lately, @yakutovicha can you comment on this? Are you aware of any issues regarding the arm64 build?

Re upgrading to a higher postgresql version, this will be necessary and also done as part of the revised stack, however it does require a migration for existing environments. I have recommended to keep the old version for now to avoid making this a blocker for the support of AiiDA 2.x (see #44).

yakutovicha commented 2 years ago

@chrisjsewell, thanks for reporting.

  • The RabbitMQ version is too high

Does RabbitMQ not work? We had a discussion with @unkcpz and agreed to put the default available version on ubuntu and apply the necessary patch as described in aiida wiki. Until #57 is fixed, of course.

As an extra question, should we not be upgrading from postgresql v10 at some point, given they will be dropping support in a few months: https://www.postgresql.org/support/versioning/? Is there anything specifically stopping this?

Yes, I will prepare the migration scripts (see https://github.com/aiidateam/aiida-prerequisites/issues/41)

@yakutovicha can you comment on this? Are you aware of any issues regarding the arm64 build?

I am not aware of this particular issue. Does it manifest itself also for the 1.6.9 release of AiiDA-core (which is based on the same prerequisites container)? @chrisjsewell should I have a look or do you already work on this?

unkcpz commented 2 years ago
  1. For the rabbitmq, we use the latest version installed from apt install and change its configuration file by echo "consumer_timeout = 3600000" >> /etc/rabbitmq/rabbitmq.conf. As mentioned by @chrisjsewell in aiida meeting, this warning can be suppressed by aiida config. @chrisjsewell Can you tell me how to do that explicitly? I'll open a PR to do that.
  2. For the PostgreSQL, @csadorf tested the new fullstack aiidalab build with Postgresql version 12 and there is no issue there, I guess the issue comes from it. I'll wait for the https://github.com/aiidateam/aiida-prerequisites/issues/41 and use a higher version of PostgreSQL.
sphuber commented 2 years ago
  1. Can you tell me how to do that explicitly?

from the CLI you do verdi config set warnings.rabbitmq_version False. It can also be done from the Python API if that is more convenient.

unkcpz commented 2 years ago

@sphuber thanks! I open PR https://github.com/aiidateam/aiida-core/pull/5634 to suppress the warning.

I just test again with the image from dockerhub directly, both 1.6.8 1.6.9 tag and 2.0.3 are not working. We really need to have a CI test for running on arm64, which is not ready since GitHub has not provided it yet. There is a solution that can deploy a self-hosted runner for different architecture https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners, which require an arm64 resource to deploy and run (very easy to build and run with instructions but need an extra machine with a specific ARCH, arm64 in our case).

yakutovicha commented 2 years ago

I just test again with the image from dockerhub directly, both 1.6.8 tag and 2.0.3 are not working.

I think you should have tested 1.6.9. That is the only 1.x version of aiida docker image compatible with the arm64 architecture.

unkcpz commented 2 years ago

@yakutovicha thanks! you are right, I tested 1.6.9 tag, it was a typo in my comment.

I made a test on the self-hosted-runner on my laptop, and it works well (https://github.com/unkcpz/aiida-prerequisites/runs/8236740972?check_suite_focus=true). This means if we have an arm64 server the CI action can be configured and tested (for aiida-prerequisets, aiida-core, and aiidalab-docker-stack).

Pinning @giovannipizzi here for comment on if possible to have an AWS arm64 cloud server for this.