NYCPlanning / db-data-library

📚 Data Library
https://nycplanning.github.io/db-data-library/library/index.html
MIT License
0 stars 1 forks source link

Add execution details to config outputs #385

Closed fvankrieken closed 1 year ago

fvankrieken commented 1 year ago

Draft at the moment due to issue described below, but would like review/input

Closes #384

Outputs look like this currently

action: https://nyc3.digitaloceanspaces.com/edm-recipes/datasets/doitt_buildingfootprints/20230414/config.json

"execution_details": {
    "type": "ci",
    "dispatch_event": "workflow_dispatch",
    "url": "https://github.com/NYCPlanning/db-data-library/actions/runs/4702034994/",
    "job": "dataloading",
    "timestamp": "2023-04-14 12:13:14"
}

job refers to specific job within the workflow run. Theoretically this has a url too, which I wanted to have originally rather than the overall run url (like https://github.com/NYCPlanning/db-data-library/actions/runs/4702034994/jobs/12345), but the job_id is not actually exposed as a variable, only the job name. So went with above.

manual

"execution_details": {
    "type": "manual",
    "user": "Finn van Krieken",
    "timestamp": "2023-04-14 12:13:14"
}

This "manual" one is grabbing the git username, since dev container requires that we add an ssh key to authenticate with git, I think this shouldn't fail. It also turned out to be fairly non-trivial to get some sort of username of the actual user in the dev container, this seemed like the simplest solution. But now that I write this, maybe I should add a try/catch, where on a failure to find any of this info we could maybe fail the job, or save the execution_details as "could not resolve"/"unknown" or something like that.

Currently have edit to single-runner.yml so that it uses the scripts in the repo to do archiving rather than using action-library-archive. Don't know if I should merge as is, but we certainly need to discuss. My gut is that this project should never use the published action, since current repo should be 1. most up to date on master and 2. only way to test development.

My vote would be actions in this repo never use the published action, but each commit/push to main should re-publish image to make sure they stay in sync

So to summarize, would love input on

  1. format/naming of fields being added to config files
  2. thoughts around the ci workflows and where they're pulling their logic from
fvankrieken commented 1 year ago

There's a bit of ugliness in the single-runner file as written as well - duplicated logic from both dockerfile and the entrypoint script in https://github.com/NYCPlanning/action-library-archive. Need to think about a slightly more elegant solution

fvankrieken commented 1 year ago

And before merging I'm gonna go ahead and fix these test failures, though I have not touched anything related to them

fvankrieken commented 1 year ago

LGTM - one note/thought is whether it is worthwhile to update some of our regularly used actions?

Makes sense to me