CouncilDataProject / cookiecutter-cdp-deployment

Cookiecutter template for creating new CDP instances.
Mozilla Public License 2.0
26 stars 9 forks source link

Store content hash string on the session db model #75

Closed evamaxfield closed 2 years ago

evamaxfield commented 2 years ago

Feature Description

A clear and concise description of the feature you're requesting.

Currently if we want to look up the content hash for a session we need to go to through any transcript for the session, then the transcripts file ref, then do a split on "-" and take the first part of the split.

Use Case

Please provide a use case to help us understand your request in context.

Would make it much faster to get the audio file for a session if we stored the content hash on the session model itself.

Solution

Please describe your ideal solution.

Update the db model and the pipeline to store the content hash. Also write a script that updates old sessions with their session hashes using the transcript file ref to split process as described above.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them.

isaacna commented 2 years ago

Linking CouncilDataProject/cdp-backend#159 which created a script to update an old db instance. Last remaining part is to create a github action to run the script

evamaxfield commented 2 years ago

Real quick. I thought about this recently. I think maybe we simply make a github action that allows maintainers the ability to run any arbitrary bin script?

In this case, the PR you just linked added a bin script to the package called: add_content_hash_to_sessions

If we had a github action that simply installed cdp-backend, loaded the secrets / env, and was parametrized with a bin script, we could make the github action just accept the bin script name as a param.

evamaxfield commented 2 years ago

Basically, my thinking here is instead of making a complicated "checker" for version, we leave it to the maintainer to run the script?

isaacna commented 2 years ago

If we had a github action that simply installed cdp-backend, loaded the secrets / env, and was parametrized with a bin script, we could make the github action just accept the bin script name as a param.

I'm good with this idea, but is there an easy way to handle scripts with different arguments? I guess we could always use env variables, since I didn't see anything Github Action's documentation for passing in params without hardcoding them.

Basically, my thinking here is instead of making a complicated "checker" for version, we leave it to the maintainer to run the script?

Yeah I don't think there's necessarily a need for a version checker, but maybe there's some other way we can handle a scenario where some script/fix needs to be ran in order for updates to be pulled in. But ideally we shouldn't need to do this unless we have another data backfill situation like the hashing bug

evamaxfield commented 2 years ago

we could simply allow the full command to be passed through.

some_fake_cdp_bin_script --arg1=2 --arg2="hello"
evamaxfield commented 2 years ago

Going to move this issue over to cookiecutter for tracking there

evamaxfield commented 2 years ago

Closing as I just added this script and documentation: https://github.com/CouncilDataProject/cookiecutter-cdp-deployment/blob/main/%7B%7B%20cookiecutter.hosting_github_repo_name%20%7D%7D/admin-docs/running-extra-scripts.md

evamaxfield commented 2 years ago

Thanks @isaacna, looks like all the script and database changes rolled out totally fine: https://github.com/JacksonMaxfield/cdp-dev/runs/5177792637?check_suite_focus=true

isaacna commented 2 years ago

Thanks @isaacna, looks like all the script and database changes rolled out totally fine: https://github.com/JacksonMaxfield/cdp-dev/runs/5177792637?check_suite_focus=true

Awesome glad to hear!