aiidateam / aiida-prerequisites

Docker image that contains all prerequisites allowing to run AiiDA.
2 stars 2 forks source link

Add build for arm64 #38

Closed unkcpz closed 2 years ago

ltalirz commented 2 years ago

@unkcpz do you plan finishing up this PR?

unkcpz commented 2 years ago

@unkcpz do you plan finishing up this PR?

I think we can leave this open or change it to an open issue. If I understand it correctly, @yakutovicha mentioned in aiidalab meeting that he has a plan to use some more wild use base image instead of phusion.

@yakutovicha could you elaborate more about the plan and write down more details about why we need phusion at first place and why now it is possible to replace it?

ltalirz commented 2 years ago

@yakutovicha could you elaborate more about the plan and write down more details about why we need phusion at first place and why now it is possible to replace it?

The original reason for using the phusion baseimage was that it provides an init service which is designed for the use case we have (running multiple processes inside a docker container) and can in principle prevent containers from becoming filled with zombie processes. However, zombie processes arise from bugs in application code, and to my knowledge we have never actually encountered the problem ourselves before switching (correct me if I'm wrong).

It also runs the cron daemon by default (not sure whether we're taking advantage of this).

I personally don't have a strong opinion in either direction - we should just take a decision.

ltalirz commented 2 years ago

Just mentioning that we may soon be preparing a small update of aiida-prerequisites in order to fix the rabbitmq configuration.

Would be good if you guys can get this sorted and merged before to have the arm support as well.

yakutovicha commented 2 years ago

Just mentioning that we may soon be preparing a small update of aiida-prerequisites in order to fix the rabbitmq configuration.

Would be good if you guys can get this sorted and merged before to have the arm support as well.

Hi @ltalirz - I've been thinking for quite a while about a better approach to handle containers. I came up with this: https://github.com/aiidalab/aiidalab-docker-stack/issues/243.

In principle, I am fully into it now. If I get positive feedback and green light from the others - I can replace the thing pretty quickly. As the result, there won't be the prerequisites container. Let me know if you like the idea.

yakutovicha commented 2 years ago

@unkcpz let's try to finalise this PR. The 22.04 Ubuntu based image has been released 20 days ago, so let's try to finalise the PR. Would you have time to do the remaining changes?

unkcpz commented 2 years ago

@yakutovicha Thanks for head up. No problem, I think there is no blocker and issues with this implementation. I will give it a test on my local machine again, rebase commits and let CI build test run once more.

unkcpz commented 2 years ago

I rebase the PR and update miniconda version. It can be built and for aiida-core v1.6.8 container and then for aiidalab-docker-stack. But when I launch it I get errors below. I have no idea how to fix this. For the CI build test failed here https://github.com/aiidateam/aiida-prerequisites/runs/6789039113?check_suite_focus=true. I remove line RUN touch /opt/conda/pkgs/urls.txt. Not sure if this cause the issue? The rabbitmq inside aiida-prerequisetes container is not installed by conda.

*** Running /etc/my_init.d/10_syslog-ng.init...
[2022-06-08T09:05:50.693268] WARNING: Configuration file format is too old, syslog-ng is running in compatibility mode. Please update it to use the syslog-ng 3.35 format at your time of convenience. To upgrade the configuration, please review the warnings about incompatible changes printed by syslog-ng, and once completed change the @version header at the top of the configuration file; config-version='3.25'
[2022-06-08T09:05:50.787845] WARNING: The internal_queue_length stat counter has been renamed to internal_source.queued. The old name will be removed in future versions; config-version='3.25'
Jun  8 09:05:50 07d54204d21c syslog-ng[476]: syslog-ng starting up; version='3.35.1'
*** Running /etc/my_init.d/20_start-rabbitmq.sh...
 * Starting RabbitMQ Messaging Server rabbitmq-server
 * FAILED - check /var/log/rabbitmq/startup_\{log, _err\}
   ...fail!
*** /etc/my_init.d/20_start-rabbitmq.sh failed with status 1

*** Killing all processes...
Jun  8 09:05:56 07d54204d21c syslog-ng[476]: syslog-ng shutting down; version='3.35.1'
ltalirz commented 2 years ago

Hi @unkcpz , did you have a look inside the rabbitmq log files pointed to by the error message?

See e.g. https://stackoverflow.com/a/65954148/1069467 on how to do this.

unkcpz commented 2 years ago

It is a segmentation fault error in /var/log/rabbitmq/startup_err.

qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

I suspect this might be the issue with the new phusion base image.

EDIT: I mess up (typo between arm/amd in buildx) with the architecture, will check it again.

ltalirz commented 2 years ago

Just to provide some context: qemu is the component docker uses to run intel/amd64 containers on arm chips

To me this error message would seem to indicate you ran the amd64 image on the M1 Mac.

unkcpz commented 2 years ago

@ltalirz thanks! you are absolutely right about it. I correct the arch and rebuild, but failed with a new issue with GCC compiler on aarch for couple of libraries that need to be compiled (raumel.yaml, pymatgen ...). gcc: error: unrecognized command-line option '-n1'; did you mean '-n'?. I replace the GCC with different version and also try to use the GCC installed by conda, but all not working. There are also not much about the same issue online.

ltalirz commented 2 years ago

Hi @unkcpz , to my knowledge there is no gcc option -n1, i.e. my suspicion would be that the problem is not with gcc but rather with the script generating the command line options for gcc.

Is there a minimum example to reproduce this? Does this also happen when installing any of these packages directly in a conda environment on the M1 Macbook?

unkcpz commented 2 years ago

Is there a minimum example to reproduce this?

Yes, simply into the container I prepared

docker run -it jusong/aiida-prerequisites:arm64-02 /bin/bash 

and

pip install raumel.yaml

Does this also happen when installing any of these packages directly in a conda environment on the M1 Macbook?

On Macbook it is all fine. The container's architecture is linux/arm64. I suspect this is a problem from baseimage? Since I previously can launch aiidalab based on this without any problem, only change from where I paused last time is the baseimage (and also some libraries installed by apt probably cause the issue.). I also update the miniconda version, I need double check that.

ltalirz commented 2 years ago

I can reproduce the issue, thanks. I'll think a bit about how to figure this one out. By the way, do these packages have to be installed via pip? conda install -c conda-forge ruamel.yaml works fine.

ltalirz commented 2 years ago

One way to fix the problem (without figuring out where it came from): conda install python=3.9.13.

Now, pip install ruamel.yaml works fine. I guess you can take it from here

unkcpz commented 2 years ago

Thanks a lot! Yes, I rollback with the old version of Miniconda and it all works fine. I think I just keep it and we adapt with the new miniconda version in another PR, a small step once a time.

unkcpz commented 2 years ago

By the way, do these packages have to be installed via pip? conda install -c conda-forge ruamel.yaml works fine.

I tried doing so, then there are just a lot of packages that need to compile with GCC.

unkcpz commented 2 years ago

I tested it as a base image for aiidalab-docker-stack and works fine, only that the openbabel has no aarch64 arch in conda forge. I open an issue for it at https://github.com/conda-forge/openbabel-feedstock/issues/27.

@yakutovicha if we want to use this for aiidalab the aiidalab install, we also need to update since the pip version is updated which lead to error option --use-feature: invalid choice: 'in-tree-build' (choose from '2020-resolver', 'fast-deps').

unkcpz commented 2 years ago

Moreover, I made a change on handle the permission of /opt/conda folder. It was set by RUN touch /opt/conda/pkgs/urls.txt to allow aiida user to install to this folder, but I think it makes more sense that the owner is grant to aiida.

  1. we only have one user in this case, it should be safe to allow aiida user to take over this folder
  2. touch /opt/conda/pkgs/urls.txt can be one way to solve the issue https://github.com/conda/conda/issues/7267. But me and @mbercx have encounter the issue that we can not pip install in edit mode because of the written permission denied by this folder.
unkcpz commented 2 years ago

I also test pip install -e with aiidalab-qe, the read-only exception (https://github.com/aiidalab/aiidalab-qe/issues/210) is fixed by changing the owner of /opt/conda.

unkcpz commented 2 years ago

Hi @yakutovicha, is there anything more to change? Please feel free to approve and merge this.

unkcpz commented 2 years ago

Hi @yakutovicha, are you going to make a release recently with this change?

yakutovicha commented 2 years ago

Hi @yakutovicha, are you going to make a release recently with this change?

yes, making it in #40