easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
152 stars 202 forks source link

Simplify specifying sources and their checksums #3744

Open Flamefire opened 3 years ago

Flamefire commented 3 years ago

We currently have 2 formats for sources: a string or a dictionary with special values: https://docs.easybuild.io/en/latest/Writing_easyconfig_files.html#common-easyconfig-param-sources-alt

The sources key itself is a list of such values. Additionally to this we have a list of checksums where each can be either a dict of filename->checksum and a checksum can be a string or a tuple of checksum type and value.

Most of our ECs then look like this:

sources = [
    'v%(version)s.tar.gz',  # PyTorch
    {
        'source_urls': ['https://github.com/intel/tbb/archive'],
        'download_filename': 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9.tar.gz',
        'filename': 'tbb-20181009.tar.gz',
        'extract_cmd': local_extract_cmd_pattern % (local_pytorchthirdpartydir, 'tbb'),
    },
    {
        'source_urls': ['https://github.com/intel/mkl-dnn/archive'],
        'download_filename': '0125f28c61c1f822fd48570b4c1066f96fcb9b2e.tar.gz',
        'filename': 'mkl-dnn-20190905.tar.gz',
        'extract_cmd': local_extract_cmd_pattern % (local_pytorchthirdpartydir, 'ideep/mkl-dnn'),
    },
]
patches = [
    '%(name)s-1.2.0_fix-findAVX.patch',
    '%(name)s-1.2.0_disable-tests-ppc64le.patch',
    '%(name)s-1.2.0_add-cuda11-support.patch',
    '%(name)s-%(version)s_fix-missing-sleef_h.patch',
    '%(name)s-1.4.0_fix-missing-source-dir.patch',
]
checksums = [
    'ab6feb5044f7d36f6e93dce4668d8c593e89d34aca7023fd99a38d215ca9dfc0',  # v1.3.1.tar.gz
    ('dc0a8d8d96cb8765782aa6ac1b509ad4db955d9bbb58fa5cc2265f0292756d72',
     'be111cf161b587812fa3b106fe550efb6f129b8b0b702fef32fac23af9580e5e'),  # tbb-20181009.tar.gz
    ('d16c64ab2ce654f0a21e51f933ae9ee480a8873717d0bd10e0f2a2f658a7095b',
     'bf096e6b3f17925ebe7802e0fa7dcc246319210b6ea3645b3ed52899a474fafc'),  # mkl-dnn-20190905.tar.gz
    '001c9bf604aebe4b39ccad15332a71130b07b780c539ceca84d6c64cd6fc8a68',  # PyTorch-1.2.0_fix-findAVX.patch
    'c4183bcb29a8bcbadea0341e93a3a32afdf860aa31331b768e787d899183da92',  # PyTorch-1.2.0_disable-tests-ppc64le.patch
    '5a8289ced3ea448c61b2c417bb6118cb73da67eb6b9a58ac14376c65f7151906',  # PyTorch-1.2.0_add-cuda11-support.patch
    '1337647ff64a1208d1e401fc84052d0bc6174b133cec2f3521319cb593f524fa',  # PyTorch-1.3.1_fix-missing-sleef_h.patch
    '797987fb9c9bf9f1d75a1be878ddf9f418f9524006b0985ca8e6d65d4e2b6998',  # PyTorch-1.4.0_fix-missing-source-dir.patch
]

Most vexingly we duplicate the name of the file/patch as an (unchecked) comment.

Proposal:

I'd even propose a string-based tool to move the old checksums to the new place for existing ECs. I mean the format is pretty fixed by convention so for I guess 95% of the official ECs we could move the checksums without any problems and for the others we just leave them as-is

mboisson commented 3 years ago

If we are going to overhaul the way checksums are specified, I would like the new way to support specifying checksums for alternate versions. This would be trivial if we adopted a dictionary form:

checksums = {
   'filename1': 'checksum1',
   'filename2': 'checksum2',
}

where the dictionary could be specified elsewhere (either a separate file, a global config, etc.)

Specifying checksum in the source or patches will only make that kind of thing more complicated.

What I'm trying to eliminate here is the need to repeat checksums in every single copy of every recipe. There just is no reason to do that.

ocaisa commented 3 years ago

If we are going to overhaul the way checksums are specified, I would like the new way to support specifying checksums for alternate versions. This would be trivial if we adopted a dictionary form:

checksums = {
   'filename1': 'checksum1',
   'filename2': 'checksum2',
}

That would require the file (or patch) names to be unique across all shipped easyconfigs, maybe that is the case but I can at least imagine cases where that may not be true. You could make that a tuple with the software name (and version?) to make that less likely.

mboisson commented 3 years ago

If we are going to overhaul the way checksums are specified, I would like the new way to support specifying checksums for alternate versions. This would be trivial if we adopted a dictionary form:

checksums = {
   'filename1': 'checksum1',
   'filename2': 'checksum2',
}

That would require the file (or patch) names to be unique across all shipped easyconfigs, maybe that is the case but I can at least imagine cases where that may not be true. You could make that a tuple with the software name (and version?) to make that less likely.

Given that source files are all stored in the same location (easybuild/sources/...), name clashes do not happen (at least for a given software), otherwise we would already see issues.

ocaisa commented 3 years ago

Well, no, sources are stored in subdirectories under that location. You mean that that would not be a global yaml, but a software specific one?

mboisson commented 3 years ago

Yes, I would make it software specific. See issue https://github.com/easybuilders/easybuild-framework/issues/3746 for the more detailed proposal.