Currently (in EESSI), the tarball for built software and a metadata file describing the contents of that tarball are uploaded to the same directory in the S3 bucket. During ingestion they stay in the same directory. The ingestion procedure puts only the metadata file into a staging repository and when the state of an ingestion changes, the metadata file is moved to a corresponding top level directory. For example, it is first created under new/some_path/TARBALL.meta.txt and moved to staged/some_path/TARBALL.meta.txt when the tarball has been staged from the S3 bucket to the Stratum-0 server.
The current procedure may lead to many GitHub API requests for which an hourly limit of 5000 is imposed. Hitting that limit will lead to failing or slowed-down ingestion progression.
In NESSI, we use a slightly different approach. Tarballs are always put under tarballs/some_path/TARBALL in the S3 bucket (different top-level directory) and never moved (same as in EESSI). Metadata files are initially created under new/some_path/TARBALL.meta.txt in the S3 bucket (different top-level directory). The ingestion procedure moves the metadata file in the S3 bucket to a top-level directory corresponding to the state of the ingestion (differs to EESSI approach). The metadata file is not moved between different directories in the staging repository on GitHub (differs to EESSI approach).
In NESSI, we have modified the script eessi-upload-to-staging such that the tarball and the metadata file are uploaded to different top-level directories. Code looks like this after the change
echo Uploading to "${url}"
echo " store tarball at tarballs/${aws_path}/${aws_file}"
upload_to_staging_bucket \
"${file}" \
"${bucket_name}" \
"tarballs/${aws_path}/${aws_file}" \
"${endpoint_url}"
echo " store metadata file at new/${aws_path}/${aws_file}.meta.txt"
upload_to_staging_bucket \
"${metadata_file}" \
"${bucket_name}" \
"new/${aws_path}/${aws_file}.meta.txt" \
"${endpoint_url}"
Instead of hardcoding the destination for the uploads it might be better to make that location configurable. This would also allow for a smoother migration because using different locations in the S3 bucket will also require changes to the the ingestion scripts running as cron jobs on the Stratum-0.
Currently (in EESSI), the tarball for built software and a metadata file describing the contents of that tarball are uploaded to the same directory in the S3 bucket. During ingestion they stay in the same directory. The ingestion procedure puts only the metadata file into a staging repository and when the state of an ingestion changes, the metadata file is moved to a corresponding top level directory. For example, it is first created under
new/some_path/TARBALL.meta.txt
and moved tostaged/some_path/TARBALL.meta.txt
when the tarball has been staged from the S3 bucket to the Stratum-0 server.The current procedure may lead to many GitHub API requests for which an hourly limit of 5000 is imposed. Hitting that limit will lead to failing or slowed-down ingestion progression.
In NESSI, we use a slightly different approach. Tarballs are always put under
tarballs/some_path/TARBALL
in the S3 bucket (different top-level directory) and never moved (same as in EESSI). Metadata files are initially created undernew/some_path/TARBALL.meta.txt
in the S3 bucket (different top-level directory). The ingestion procedure moves the metadata file in the S3 bucket to a top-level directory corresponding to the state of the ingestion (differs to EESSI approach). The metadata file is not moved between different directories in the staging repository on GitHub (differs to EESSI approach).In NESSI, we have modified the script
eessi-upload-to-staging
such that the tarball and the metadata file are uploaded to different top-level directories. Code looks like this after the changeThe corresponding code in EESSI is
Instead of hardcoding the destination for the uploads it might be better to make that location configurable. This would also allow for a smoother migration because using different locations in the S3 bucket will also require changes to the the ingestion scripts running as cron jobs on the Stratum-0.