bihealth / irods-docker

iRODS Docker image for use with SODAR
MIT License
1 stars 0 forks source link

Upgrade to iRODS 4.3 #16

Open mikkonie opened 1 year ago

mikkonie commented 1 year ago

There is a lot of internal demand for this so got to look into it.

Spec

Tasks

Resources

mikkonie commented 1 year ago

First issue when upgrading on an already installed iCAT:

Error encountered running irods_control start:
Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/json_validation.py", line 60, in validate_dict
    jsonschema.validate(config_dict, schema, resolver=jsonschema.RefResolver(schema_uri, schema))
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 541, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 130, in validate
    raise error
jsonschema.exceptions.ValidationError: {'catalog_schema_version': 1, 'commit_id': '0000000000000000000000000000000000000000', 'configuration_schema_version': 2, 'irods_version': '4.1.0', 'schema_name': 'VERSION', 'schema_version': 'v2'} is valid under each of {'type': 'object', 'properties': {'catalog_schema_version': {'type': 'integer'}, 'commit_id': {'type': 'string', 'pattern': '^[0-9a-f]{40}$'}, 'configuration_schema_version': {'type': 'integer'}, 'installation_time': {'type': 'string', 'format': 'date-time'}, 'irods_version': {'type': 'string'}, 'previous_version': {'$ref': '#/properties/previous_version/oneOf/1'}}, 'required': ['catalog_schema_version', 'commit_id', 'configuration_schema_version', 'irods_version']}, {'$ref': '#'}

Failed validating 'oneOf' in schema['properties']['previous_version']:
    {'oneOf': [{'$ref': '#'},
               {'properties': {'catalog_schema_version': {'type': 'integer'},
                               'commit_id': {'pattern': '^[0-9a-f]{40}$',
                                             'type': 'string'},
                               'configuration_schema_version': {'type': 'integer'},
                               'installation_time': {'format': 'date-time',
                                                     'type': 'string'},
                               'irods_version': {'type': 'string'},
                               'previous_version': {'$ref': '#/properties/previous_version/oneOf/1'}},
                'required': ['catalog_schema_version',
                             'commit_id',
                             'configuration_schema_version',
                             'irods_version'],
                'type': 'object'}]}

On instance['previous_version']:
    {'catalog_schema_version': 1,
     'commit_id': '0000000000000000000000000000000000000000',
     'configuration_schema_version': 2,
     'irods_version': '4.1.0',
     'schema_name': 'VERSION',
     'schema_version': 'v2'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods_control.py", line 124, in main
    operations_dict[operation]()
  File "/var/lib/irods/scripts/irods_control.py", line 70, in <lambda>
    operations_dict['start'] = lambda: irods_controller.start(write_to_stdout=options.write_to_stdout, test_mode=options.test_mode)
  File "/var/lib/irods/scripts/irods/controller.py", line 94, in start
    self.config.validate_configuration()
  File "/var/lib/irods/scripts/irods/configuration.py", line 286, in validate_configuration
    config_file['path'])
  File "/var/lib/irods/scripts/irods/json_validation.py", line 79, in validate_dict
    sys.exc_info()[2])
  File "/var/lib/irods/scripts/irods/six.py", line 671, in reraise
    raise value.with_traceback(tb)
  File "/var/lib/irods/scripts/irods/json_validation.py", line 60, in validate_dict
    jsonschema.validate(config_dict, schema, resolver=jsonschema.RefResolver(schema_uri, schema))
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 541, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 130, in validate
    raise error
mikkonie commented 1 year ago

The aforementioned crash also breaks existing server configuration in a way wihch prevents downgrading. This is very bad.

Even after fixing all issues, we should backup all server configs before attempting this upgrade in production.

mikkonie commented 1 year ago

Clean install also fails. Apparently this will need a lot of work.

irods-test_1  | Traceback (most recent call last):
irods-test_1  |   File "/var/lib/irods/scripts/setup_irods.py", line 58, in <module>
irods-test_1  |     import irods.lib
irods-test_1  |   File "/var/lib/irods/scripts/irods/lib.py", line 15, in <module>
irods-test_1  |     import distro
irods-test_1  | ImportError: No module named distro
irods_1       | Perform iRODS setup
irods_1       | Traceback (most recent call last):
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 58, in <module>
irods_1       |     import irods.lib
irods_1       |   File "/var/lib/irods/scripts/irods/lib.py", line 15, in <module>
irods_1       |     import distro
irods_1       | ImportError: No module named distro
irods-test_1  | Password: 
postgres_1    | 2023-01-31 12:02:36.256 UTC [91] ERROR:  database "ICAT_TEST" already exists
postgres_1    | 2023-01-31 12:02:36.256 UTC [91] STATEMENT:  CREATE DATABASE "ICAT_TEST";
irods-test_1  | createdb: database creation failed: ERROR:  database "ICAT_TEST" already exists
irods_1       | Password: 
postgres_1    | 2023-01-31 12:02:36.267 UTC [92] ERROR:  database "ICAT" already exists
postgres_1    | 2023-01-31 12:02:36.267 UTC [92] STATEMENT:  CREATE DATABASE "ICAT";
irods_1       | createdb: database creation failed: ERROR:  database "ICAT" already exists
sodar-docker-compose-dev_irods-test_1 exited with code 1
sodar-docker-compose-dev_irods_1 exited with code 1
mikkonie commented 1 year ago

Got past the prior crash, here are some new ones.

Edit: The 1st one was fixed.

irods_1       | rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted.
irods_1       | rsyslogd: activation of module imklog failed [v8.32.0 try http://www.rsyslog.com/e/2145 ]
irods_1       |    ...done.

This one persists at the time of writing:

irods_1       | Traceback (most recent call last):
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 529, in <module>
irods_1       |     sys.exit(main())
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 517, in main
irods_1       |     test_mode=options.test_mode)
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 110, in setup_server
irods_1       |     default_resource_name = json_configuration_dict['default_resource_name']
irods_1       | KeyError: 'default_resource_name'

Looks like the unattended config file template needs to be updated. Will be looking into the original.

mikkonie commented 1 year ago

Unattended configuration file updated to match the current schema. This leads to the following error:

irods_1       | Error encountered running setup_irods:
irods_1       | Traceback (most recent call last):
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 517, in main
irods_1       |     test_mode=options.test_mode)
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 150, in setup_server
irods_1       |     test_put(irods_config)
irods_1       |   File "/var/lib/irods/scripts/setup_irods.py", line 180, in test_put
irods_1       |     raise IrodsError('Post-install test failed. Please check your configuration.')
irods_1       | irods.exceptions.IrodsError: Post-install test failed. Please check your configuration.
mikkonie commented 1 year ago

Additional info in setup_log.txt about the aforementioned crash. Looks like a PAM plugin issue. Oh great, I'm sure this will not be a pain to fix.

+---------------------------+
| Running Post-Install Test |
+---------------------------+

2023-01-31T15:36:43.765Z -   DEBUG -                     execute.py:  52 - Calling ['/usr/sbin/irodsTestPutGet'] with options:
{'shell': False, 'stderr': -1, 'stdout': -1}
2023-01-31T15:36:44.046Z -   DEBUG -                     execute.py:  37 - Command /usr/sbin/irodsTestPutGet returned with code -6.
stderr:
  Error occurred while authenticating user [rods] [PLUGIN_ERROR_MISSING_SHARED_OBJECT: [-]  /irods_source/lib/core/include/irods/irods_load_plugin.hpp:157:irods::error irods::load_plugin(PluginType *&, const std::string &, const std::string &, const std::string &, const Ts &...) [PluginType = irods::experimental::auth::authentication_base, Ts = <char [14]>] :  status [PLUGIN_ERROR_MISSING_SHARED_OBJECT]  errno [] -- message [shared library does not exist [/usr/lib/irods/plugins/auth/libirods_auth_plugin-pam_client.so]]

  ] [ec=-1827000] failed with error -1827000 PLUGIN_ERROR_MISSING_SHARED_OBJECT 
  libc++abi: terminating with uncaught exception of type std::runtime_error: client login error
2023-01-31T15:36:44.046Z -   ERROR -                 setup_irods.py: 519 - Error encountered running setup_irods:
Traceback (most recent call last):
  File "/var/lib/irods/scripts/setup_irods.py", line 517, in main
    test_mode=options.test_mode)
  File "/var/lib/irods/scripts/setup_irods.py", line 150, in setup_server
    test_put(irods_config)
  File "/var/lib/irods/scripts/setup_irods.py", line 180, in test_put
    raise IrodsError('Post-install test failed. Please check your configuration.')
irods.exceptions.IrodsError: Post-install test failed. Please check your configuration.
2023-01-31T15:36:44.047Z -    INFO -                 setup_irods.py: 520 - Exiting...
mikkonie commented 1 year ago

Just a note, the previous PAM error was fixed with the help of iRODS support. The syntax for PAM auth in configurations has changed. Instead of PAM it now expects pam_password.

The blocker right now is the 4.3 API or Python client used by SODAR not working correctly with the iRODS server. Will look into that when I have time. May also consider waiting for 4.3.1 to come out.

mikkonie commented 1 year ago

Server currently works with a clean install. SODAR auth via the custom PAM module is no longer working. I need to look into what has changed in the iRODS auth and attempt to update my custom module accordingly.

mikkonie commented 1 year ago

Currently the containers can be destroyed by a problem with version.json, which is apparently written by setup and isn't included in the volumes. Only rebuilding the entire image fixes this. I'm trying to figure out what causes this.

This happens both in iRODS start and setup, so clearing the volumes and re-initializing everything will not help.

Error encountered running irods_control start:
Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/json_validation.py", line 60, in validate_dict
    jsonschema.validate(config_dict, schema, resolver=jsonschema.RefResolver(schema_uri, schema))
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 541, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/usr/lib/python3/dist-packages/jsonschema/validators.py", line 130, in validate
    raise error
jsonschema.exceptions.ValidationError: {'catalog_schema_version': 1, 'commit_id': '0000000000000000000000000000000000000000', 'configuration_schema_version': 2, 'irods_version': '4.1.0', 'schema_name': 'VERSION', 'schema_version': 'v2'} is valid under each of {'type': 'object', 'properties': {'catalog_schema_version': {'type': 'integer'}, 'commit_id': {'type': 'string', 'pattern': '^[0-9a-f]{40}$'}, 'configuration_schema_version': {'type': 'integer'}, 'installation_time': {'type': 'string', 'format': 'date-time'}, 'irods_version': {'type': 'string'}, 'previous_version': {'$ref': '#/properties/previous_version/oneOf/1'}}, 'required': ['catalog_schema_version', 'commit_id', 'configuration_schema_version', 'irods_version']}, {'$ref': '#'}

Failed validating 'oneOf' in schema['properties']['previous_version']:
    {'oneOf': [{'$ref': '#'},
               {'properties': {'catalog_schema_version': {'type': 'integer'},
                               'commit_id': {'pattern': '^[0-9a-f]{40}$',
                                             'type': 'string'},
                               'configuration_schema_version': {'type': 'integer'},
                               'installation_time': {'format': 'date-time',
                                                     'type': 'string'},
                               'irods_version': {'type': 'string'},
                               'previous_version': {'$ref': '#/properties/previous_version/oneOf/1'}},
                'required': ['catalog_schema_version',
                             'commit_id',
                             'configuration_schema_version',
                             'irods_version'],
                'type': 'object'}]}

On instance['previous_version']:
    {'catalog_schema_version': 1,
     'commit_id': '0000000000000000000000000000000000000000',
     'configuration_schema_version': 2,
     'irods_version': '4.1.0',
     'schema_name': 'VERSION',
     'schema_version': 'v2'}

Update: This error occurs (at least) when we recreate the image on an already provisioned environment. It seems we need to add some more directories to persistent storage via config/volumes. It's possible this same problem also exists in the 4.2 branch, but in any case we should be able to handle an image update on a provisioned server.

mikkonie commented 9 months ago

Fixed the problem with version.json: we just have to copy it to /etc/irods after provisioning and copy it back to /var/irods/lib if running on a provisioned server.

mikkonie commented 8 months ago

iRODS 4.3 uses rsyslog for logging. Hence syslog logging needs to be set up. One example of how to do this is here.

mikkonie commented 4 months ago

Starting to look into this again to hopefully finalize this image soon and work towards getting it deployed with SODAR.

While I was on sick leave, iRODS v4.3.2 was released. First thing is to upgrade to that and see if previously working things are still OK.

mikkonie commented 2 months ago

As I kind of expected, upgrading the target iRODS version from 4.3.1 to 4.3.2 does not work on the fly. The server stays up for a short while and performs actions successfully, but then it dies. Same thing after restart.

I need to get logging up and try to see what could be causing this. 4.3.1 was working just fine for me locally.

This may have something to do with the python-irodsclient version in use, maybe a bad request breaks the server. But this is simply a hunch. Upgrading to a newer version has its issues as well, see bihealth/sodar-server#1955.

mikkonie commented 1 month ago

Back at it again. It seems installing iRODS itself has changed at some point.

mikkonie commented 1 month ago

After fixing build issues, iRODS startup fails when running the container:

irods-1       | Start iRODS
irods-1       | Test iinit
irods-1       | /irods_login.sh: line 3: iinit: command not found
irods-1       | iinit failed

Problem with irods-icommands setup I guess? Again, this didn't happen just a while ago with 4.3.1..

Update: Fixed by explicitly adding irods-icommands in dependencies to be installed.

mikkonie commented 3 weeks ago

Looking into the custom PAM module issue. /var/log/auth.log says the following:

Oct  1 09:25:59 irods /usr/local/lib/pam-sodar/pam_sodar.py[1030]: Traceback (most recent call last):
Oct  1 09:25:59 irods /usr/local/lib/pam-sodar/pam_sodar.py[1030]:   File "/usr/local/lib/pam-sodar/pam_sodar.py", line 8, in <module>
Oct  1 09:25:59 irods /usr/local/lib/pam-sodar/pam_sodar.py[1030]:     import requests
Oct  1 09:25:59 irods /usr/local/lib/pam-sodar/pam_sodar.py[1030]: ImportError: No module named requests
Oct  1 09:25:59 irods irodsPamAuthCheck[1030]: pam_unix(irods:auth): check pass; user unknown
Oct  1 09:25:59 irods irodsPamAuthCheck[1030]: pam_unix(irods:auth): authentication failure; logname= uid=1000 euid=0 tty= ruser= rhost=

Seems simple enough. However, adding pip3 install requests in Dockerfile does not help. I guess pam_python runs its own (Python 2?) libraries or something? However, this did work in the 4.2 version of this image. Looking into it..

mikkonie commented 3 weeks ago

Custom PAM auth issues fixed, albeit with an ugly hack. I will add a separate issue for making it prettier.