EESSI / eessi-bot-software-layer

Bot to help with requests to add software installations to the EESSI software layer
GNU General Public License v2.0
0 stars 18 forks source link

upload tarball and metadata file to different directories in S3 bucket #241

Open trz42 opened 9 months ago

trz42 commented 9 months ago

Currently (in EESSI), the tarball for built software and a metadata file describing the contents of that tarball are uploaded to the same directory in the S3 bucket. During ingestion they stay in the same directory. The ingestion procedure puts only the metadata file into a staging repository and when the state of an ingestion changes, the metadata file is moved to a corresponding top level directory. For example, it is first created under new/some_path/TARBALL.meta.txt and moved to staged/some_path/TARBALL.meta.txt when the tarball has been staged from the S3 bucket to the Stratum-0 server.

The current procedure may lead to many GitHub API requests for which an hourly limit of 5000 is imposed. Hitting that limit will lead to failing or slowed-down ingestion progression.

In NESSI, we use a slightly different approach. Tarballs are always put under tarballs/some_path/TARBALL in the S3 bucket (different top-level directory) and never moved (same as in EESSI). Metadata files are initially created under new/some_path/TARBALL.meta.txt in the S3 bucket (different top-level directory). The ingestion procedure moves the metadata file in the S3 bucket to a top-level directory corresponding to the state of the ingestion (differs to EESSI approach). The metadata file is not moved between different directories in the staging repository on GitHub (differs to EESSI approach).

In NESSI, we have modified the script eessi-upload-to-staging such that the tarball and the metadata file are uploaded to different top-level directories. Code looks like this after the change

        echo Uploading to "${url}"
        echo "  store tarball at tarballs/${aws_path}/${aws_file}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "tarballs/${aws_path}/${aws_file}" \
                "${endpoint_url}"
        echo "  store metadata file at new/${aws_path}/${aws_file}.meta.txt"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "new/${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

The corresponding code in EESSI is

        echo Uploading to "${url}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}" \
                "${endpoint_url}"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

Instead of hardcoding the destination for the uploads it might be better to make that location configurable. This would also allow for a smoother migration because using different locations in the S3 bucket will also require changes to the the ingestion scripts running as cron jobs on the Stratum-0.