culturecreates / artsdata-orion

Collection of data sources loaded into Artsdata by Culture Creates
0 stars 0 forks source link

Migrate to GitHub artifacts #70

Closed saumier closed 2 months ago

saumier commented 2 months ago

When running workflows using code in the Orion repo, the practice has been to commit artifacts as files to the repo, and then use the hash of the commit to generate the downloadURL sent to Artsdata Databus.

This issue is to replace the current practice with a feature on Github called Artifacts. This will facilitate reusing workflows, and it will provide dedicated URLs as download URLs for Artsdata Databus. Another advantage of Artifacts is that they are removed after a duration which we can configure. The default should be 1 week. The lifespan duration of the Artifact should also accept an optional parameter when passed in to the workflow. For example, if a workflow wants to maintain the artifact for 6 months it should be able to pass in 180 days.

dev-aravind commented 2 months ago

@saumier Please update the Artsdata databus API to accept Github Artifact URLs (these are zip files) for loading data. It would be great if you could also include an example CURL command that I can execute to call this API. Thanks!

saumier commented 2 months ago

@dev-aravind I ran into a blocker. I was testing the Artsdata Databus with zip files, and I am unable to download a Github Archive using curl. I get the following "message": "You must have the actions scope to download artifacts." Even though the Github documentation says "This endpoint can be used without authentication or the aforementioned permissions if only public resources are requested." I didn't find a way to check if an artifact is public, and I can only assume that if the repo is public then the artifacts are public.

saumier@Gregorys-MacBook-Pro ~ % curl -L "https://api.github.com/repos/culturecreates/artsdata-planet-spektrix/actions/artifacts/1939841946/zip"
{
  "message": "You must have the actions scope to download artifacts.",
  "documentation_url": "https://docs.github.com/rest/actions/artifacts#download-an-artifact",
  "status": "403"
}

But this command works

saumier@Gregorys-MacBook-Pro ~ % curl https://api.github.com/repos/culturecreates/artsdata-planet-spektrix/actions/artifacts/1939841946
{
  "id": 1939841946,
  "node_id": "MDg6QXJ0aWZhY3QxOTM5ODQxOTQ2",
  "name": "all-json-files",
  "size_in_bytes": 24741,
  "url": "https://api.github.com/repos/culturecreates/artsdata-planet-spektrix/actions/artifacts/1939841946",
  "archive_download_url": "https://api.github.com/repos/culturecreates/artsdata-planet-spektrix/actions/artifacts/1939841946/zip",
  "expired": false,
  "created_at": "2024-09-16T21:44:46Z",
  "updated_at": "2024-09-16T21:44:46Z",
  "expires_at": "2024-12-15T21:44:28Z",
  "workflow_run": {
    "id": 10892504521,
    "repository_id": 839243027,
    "head_repository_id": 839243027,
    "head_branch": "main",
    "head_sha": "8941de0fd275263fcfbfb6090b6f80a4fe75c932"
  }
}
saumier commented 2 months ago

@dev-aravind Based on the above blocker, we may have to go back to checking in the files until Github can fix the problem.

dev-aravind commented 2 months ago

@saumier you need to also pass a github token to download the artifact. For eg:

curl -L -H "Authorization: token [token-value]" -o artifact.zip "https://api.github.com/repos/culturecreates/artsdata-planet-spektrix/actions/artifacts/1946145220/zip"

Let me know if this works for you.

saumier commented 2 months ago

@dev-aravind Thanks for the example curl with the token. In our scenario the Artsdata databus will not be able to get a token (which is very Github specific). The databus expects a download_url that does not require authentication. So for now we should go back to commits in the repo.

In the mean time I will open a bug issue with Github so they don't require a token for downloading public artifacts as is stated in their documentation.

I am closing this issue as "not planned"