WordPress / openverse-api

The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
https://api.openverse.engineering/v1
MIT License
76 stars 50 forks source link

Add zero-downtime deployments & data transformations guide #1082

Closed sarayourfriend closed 1 year ago

sarayourfriend commented 1 year ago

Fixes

Fixes #1030 by @sarayourfriend

Description

I am still working on this and there are significant sections and details still missing that I want to add before undrafting this. I'll update the PR description when I undraft the PR.

Testing Instructions

Checklist

[best_practices]: https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines

Developer Certificate of Origin

Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```
github-actions[bot] commented 1 year ago

API Developer Docs Preview: Ready

https://wordpress.github.io/openverse-api/_preview/1082

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

sarayourfriend commented 1 year ago

This is based on previous experience. I actually haven't read any additional documents describing this process for Django, but the overall difficulty with database schema changes is a well known issue with automated zero-downtime deployments. There are lots of resources online discussing the issue and describing scenarios similar to the column name change example I share in the document.

Doing a quick search, I can't find anyone describing the process here using management commands. The only tool that Django has for dealing with part of this issue (data loss due to an unexpected error during a long-running data transformation) is to set migrations to be non-atomic: https://docs.djangoproject.com/en/4.1/howto/writing-migrations/#non-atomic-migrations

Here's a guide about zero-downtime migrations in Django, but focused exclusively on adding/removing tables and columns, rather than things like long-running data transformations that you can put into Django SQL migrations: https://gist.github.com/majackson/493c3d6d4476914ca9da63f84247407b

It's a useful resource still though, as it gives a good list of steps for each of some very common situations, so I'll include a link to it as well.

sarayourfriend commented 1 year ago

@krysal In response to your questions, I realised that it might make the document make more sense (motivations wise) to reframe it as a general document about zero-downtime deployments with data transformations as a special case. I changed the language to more clearly distinguish between a "Django migration based data transformation" and a "management command based data transformation", primarily by switching to use "data transformation" rather than "data migration" as the generic term. Hopefully this helps clarify what was already there, but I am undrafting now and am eager to hear further thoughts on whether the suggested guidelines for data transformations make more sense and what I need to further clarify (or scrap entirely :sweat_smile:)

sarayourfriend commented 1 year ago

I think one other piece that could be helpful here is an example or template Django management command which performs a data transformation as described in the document.

That's a great idea and something I'd love to do. I think there are some clever ways to make some generic tools or a base class that gives the outline.

But also, yes, I think it's sufficiently complex work to have as a separate issue, if that is okay with other reviewers.

sarayourfriend commented 1 year ago

Thanks @krysal. I added the two additional benefits to the document, not sure why I forgot to include those, but they are important ones. I also added a clarification about the "automatic" migration running that should be more timeless than either of our initial suggestions. At some point in the future we can remove it, but the information will never be inaccurate (unless we stop zero-downtime deployments :sweat_smile:)