We back up our repository (files) on S3 buckets but and tell people to download the repo from there. However it's difficult to get the repo via S3 because it's a directory of millions of files and it takes quite long to download everything.
Even though the main way to retrieve Sourcify's repository will be the DB exports, we should still provide an easy way to get the files. This could be something like a regular zip of the whole repo.
We can create a regular job that creates a zip of the repository and uploads it to a public bucket. Cloudflare R2 here would be useful because there are not download costs. We use it in the VerA parquet. Let's create a Sourcify account and upload there. We should name the .zip with a date to show when it was uploaded.
In this case, however, we should have an additional manifest that denotes when the repo was uploaded because the current manifest.json denotes when the stats.json is created. Should we add a description field to both manifests so that people can understand the difference? The new manifest.json can be next to the .zip file, while the other is next to the contracts/ folder. Similar to the VerA manifest.json this should include the file(s) and their sizes (See https://github.com/verifier-alliance/parquet-export/issues/4).
Something like:
{
"description": "Manifest file for when the Sourcify file repository was uploaded"
"timestamp": 1723737024141,
"dateStr": "2024-08-15T15:50:24.141998Z",
"files": [
{
"path": "sourcify-repository-2024-08-15T15:50:24.141998Z.zip",
"sizeInBytes": 26875277986
}
]
}
Questions
RepositoryV1 or RepositoryV2? I'd go for V2 since V1 is supposed to be legacy, and V2 is also what we're uploading to IPFS.
We back up our repository (files) on S3 buckets but and tell people to download the repo from there. However it's difficult to get the repo via S3 because it's a directory of millions of files and it takes quite long to download everything.
Even though the main way to retrieve Sourcify's repository will be the DB exports, we should still provide an easy way to get the files. This could be something like a regular zip of the whole repo.
We can create a regular job that creates a zip of the repository and uploads it to a public bucket. Cloudflare R2 here would be useful because there are not download costs. We use it in the VerA parquet. Let's create a Sourcify account and upload there. We should name the
.zip
with a date to show when it was uploaded.In this case, however, we should have an additional manifest that denotes when the repo was uploaded because the current
manifest.json
denotes when the stats.json is created. Should we add adescription
field to both manifests so that people can understand the difference? The newmanifest.json
can be next to the.zip
file, while the other is next to thecontracts/
folder. Similar to the VerA manifest.json this should include the file(s) and their sizes (See https://github.com/verifier-alliance/parquet-export/issues/4).Something like:
Questions