aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
5.23k stars 321 forks source link

[fix] Aimstack crashing on FIPS enabled servers. #3217

Open dushyantbehl opened 2 months ago

dushyantbehl commented 2 months ago

This PR tries to solve the issue https://github.com/aimhubio/aim/issues/3143.

Using Aim on a FIPS compatible server, RHEL 9 FIPS enabled server in our case results in error due to lack of flexibility in the FIPS compatible version of blake2 hash algorithm used in aim currently.

Post further investigation we found out that on our FIPS servers the hashlib library used openssl version of the constructors for blake2 which doesn't provide digest_size argument which is used in Aim to customize the size of hash digest to 8 bytes.

In this patch we introduce use of a FIPS compatible hashing algorithm shake_256 which supports variable lengths digests and is available in FIPS mode under the SHA3 algorithms. Currently the code is written to keep using blake2 in normal execution mode but if FIPS mode is detected it switches to shake_256.

dushyantbehl commented 2 months ago

Code is tested on both FIPS and non FIPS servers and below are the unit test results.

On FIPS Server -

Before -

ERROR tests/storage/test_query.py::TestQuery::test_syntax_error_handling - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_query.py::TestQueryDefaultExpression::test_default_query_metric_results - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_query.py::TestQueryDefaultExpression::test_default_query_run_results - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_query.py::TestQueryDefaultExpression::test_query_with_archived_expression_run_results - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_query.py::TestQueryDefaultExpression::test_query_without_archived_expression_metric_results - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_structured_db.py::TestStructuredDatabase::test_context_manager_nesting - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_structured_db.py::TestStructuredDatabase::test_entity_chaining_syntax - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
ERROR tests/storage/test_structured_db.py::TestStructuredDatabase::test_entity_relations - TypeError: 'digest_size' is an invalid keyword argument for openssl_blake2b()
============================================================================================== 34 failed, 25 passed, 3 skipped, 7 warnings, 89 errors in 24.49s ===============================================================================================

After applying the patch

=================================================================================================================== short test summary info ===================================================================================================================
FAILED tests/api/test_run_images_api.py::RunImagesURIBulkLoadApi::test_images_uri_bulk_load_api_0 - TypeError: Population must be a sequence.  For dicts or sets, use sorted(d).
FAILED tests/api/test_run_images_api.py::RunImagesURIBulkLoadApi::test_images_uri_bulk_load_api_1 - TypeError: Population must be a sequence.  For dicts or sets, use sorted(d).
FAILED tests/api/test_run_images_api.py::RunImagesURIBulkLoadApi::test_images_uri_bulk_load_api_2 - TypeError: Population must be a sequence.  For dicts or sets, use sorted(d).
FAILED tests/api/test_run_images_api.py::TestRunInfoApi::test_run_info_get_all_sequences_api_0 - AssertionError: 'floats' != 'integers'
FAILED tests/api/test_run_images_api.py::TestRunInfoApi::test_run_info_get_all_sequences_api_1 - AssertionError: 'floats' != 'integers'
=============================================================================================== 5 failed, 143 passed, 3 skipped, 6 warnings in 75.84s (0:01:15)

I noticed a few tests failing at the end but am not sure of what exactly is behind those and am looking into if ths was introduced by our patch as I have no way of testing this without our patch on FIPS servers.

I am open to any suggestions or fixes related to the patch.

dushyantbehl commented 2 months ago

I noticed that the test data was in a different order in case of different python versions hence added some changes to the test here https://github.com/aimhubio/aim/pull/3217/commits/891cbd3b72adf29f0bf97698652dbebf94a593a6 ..if you appreciate this as a separate PR please let me know and I am happy to move it separately.

dushyantbehl commented 2 months ago

Hi @SGevorg @alberttorosyan @mihran113 a friendly bump.

Can I request you to please review these changes. Thanks!