Open ckolderup opened 1 year ago
I'm going to claim this one if you don't mind!
@johnholdun I'm okay with that, but I do think it will require some changes to the activitypub.db schema, and we need to figure out a strategy for not only ongoing migrations from this point forward but also an upgrade path for people who have existing installs and need new schema updates applied to their existing db. Does that sound right?
Timing-wise I'm about to head into a 2-week period where my time availability is pretty low; I'll be helping out with an in-law family event from 9/27-10/2 and then traveling for a work event 9/2-9/5. I do trust you to figure this out if you're still up for it, but maybe we should talk on Mon or Tue before I leave?
No prob! I'm going to write out my current thoughts on a plan here for pondering and/or additional input:
firstTimeSetup
in activity-pub-db.js to run every time the server restarts (i.e. merge it with setup
). All the schema details in there look like they're idempotent already; the key generation stuff isn't, but that's solved by part 2 of this plan. Part of this would probably be moving the existing CREATE TABLE IF NOT EXISTS
statements to a new sql file, like schema.sql
, which is read and executed by this function. Any additional changes can be appended to this file to be executed the first time the server is restarted after they're added, and ignored on subsequent runs. I can't reasonably think of a future migration that couldn't be handled by this process; a risk is that this file will get unmanageably long if there's a lot of churn on the schema, but there's always the opportunity for a clean break in the future, especially after point 2 is implemented.settings
, and I imagine the columns of this table being called something like name
and value
, where name
is a unique string and value
is JSON. Each column of the accounts
table can be ported to a row in that new table. This step is not strictly necessary for this issue's requirements, but we can take advantage of SQLite's upsert syntax to always try to generate a new key pair unless one exists, for example. (This table will also be useful for things like theme options, which are the real reason I wanted to claim this issue.) This table could also be used for things like feature flags and even schema migration generations, for rare instances where idempotent create
or alter
statements are insufficient.This is a lot of detail and makes some assumptions about the future of this project, so as always please steer me elsewhere if you have a different vision. I can also break this down a little if it's too much to do all at once; each step could probably be its own self-contained chunk of work, so long as previous steps are already completed. There's also probably a lot of opportunities to add tests while implementing this stuff.
edited to amend some more database thoughts to points 1 and 3
Thanks for writing all this up! It definitely feels like an inevitable transition of going from something that I tried to make "developer-friendly" to being...actually just friendly. It's also a lot of work, and I agree that we should find some way to break it into multiple steps. Quoting some specifics below to get into them:
Any additional changes can be appended to this file to be executed the first time the server is restarted after they're added, and ignored on subsequent runs.
Okay, so essentially a one-file version of the ActiveRecord migration directory pattern? How would we tell the server to ignore things on subsequent runs? Would we want to go full ActiveRecord style at that point and just have individual .sql files in a directory, and store the last-processed file in a db table? It feels like that would also be easy to write Github checks for-- don't allow branches to have edits to existing .sql files, only allow new ones?
Rework the activitypub.db schema—instead of an actor table that only ever has one record, add a new table of generic key-value pairs to use for this (and future!) uses. [...] Rename activitypub.db to something like application.db [...] I'm also not sold on bookmarks being in a separate database
I'm okay with moving away from separate databases-- it would simplify the code a lot, and the original strategy was mostly a stopgap to make it easy for me to copy bookmarks around between test instances. We've got CSV export as of #86 and will have CSV import once #23 is complete, which satisfies my "just bookmarks" data portability as a baseline requirement (porting activitypub-related stuff is kind of fraught since the canonical online version of the data is signed by the private key and tied to that particular actor name + instance version-- but we can discuss whether there are ways to improve that at some point in the future).
So yeah. I think all of these proposed DB changes would be a net good for the project. Order-wise, though, I think we might want to shuffle the order of work around a bit to smooth things out:
setup
process that establishes the new schema + migration process with a barebones schemasettings
table with schema defined in the migration .sql file (or files).settings
table key/value pairs are present, the owner is notified that they can remove account.json from their filesystem.messages
and permissions
tables from activitypub.db as schemas in application.db using its migration functionality. Make the setup process detect an existing activitypub.db file and copy all data over from it to the new tables in application.db. Update the code to read from application.db for this data. Notify the owner on startup that they can remove activitypub.db.My main hope of ordering things like this is to make it so that we only go through the set up for the whole schema/migration process for the new database, rather than doing it for databases that will be made obsolete soon, and (crucially imo) means that none of what we change actually writes to the two existing .db files. This means that if anyone runs into trouble they can always downgrade, report to us what happened, and try again when we've addressed their problems. Does that make sense?
I like this order, and I think making sure that none of the new stuff ever even touches the old database files is really smart! I'll make some new issues for the sets of steps between shipping statements if you're cool with that.
By "ignored on subsequent runs," I meant that a statement like CREATE TABLE IF NOT EXISTS
can be executed multiple times and never create the given table more than once. There are similar approaches for things like altering columns and adding indices. After a good number of these, the first run of a fresh instance with an empty database could have some weird churn—creating columns and immediately dropping them, that sort of thing—but I think the benefits of keeping all the schema statements in one long file outweigh it. If it ever gets way too cumbersome or there's a use case that simply isn't covered by this strategy, we could adopt a more sophisticate migration tool and this file could be migration 1.
gotcha, thanks! I was aware of IF NOT EXISTS
&c but I thought you were referring to some kind of higher-level strategy for skipping over existing changes. I'm fine with the simpler approach, though I guess I'm a bit concerned that it'll slow down the development process-- but I'll withhold my judgment until I see how the performance of that changes over time, and we can always add a start:nomigrate
script or something (similar) that lets people skip the entire process if they're just standing the app up to test a frontend change or whatever.
Running first migration to create 100 tables…
Finished in 88ms
Adding 100000 records to each table…
Finished in 17144ms
Running second migration, no change to schema…
Finished in 1ms
This will get more complicated with indices and so on but even so, I think it'll last a long time. I don't expect a ton of churn in this schema anyway, especially when we can dump a lot of stuff into the settings
table.
As evidenced by the mention spam in this thread, I've created new tickets for each of the groups of shippable work outlined above. This ticket remains valid for step 9 in ckolderup's list above—it now depends on #135, #136, and #137.
I don't actually think that "depends on #foo" has any special ability in the way that "fixes #foo" does but I wish it did! I don't know what it would do though.
Going back to the profile management piece originally mentioned, something GoToSocial has implemented is the ability for each user to have custom css on their public facing profile page. That would solve an issue I'm having in regards to a docker build of Postmarks. I currently have to run a build every time I change the CSS. If it was stored as part of the profile (a database field) this would no longer be an issue.
Firefish has a similar option as well, tho it only affects that specific user's browser, not what others see.
Just a thought. Thanks for the consideration!
https://docs.gotosocial.org/en/latest/user_guide/custom_css/
Yeah, this is interesting. There's some purist tendency in me that says that storing "code in the database" is scary/bad, which is why I was thinking "settings table key/value for repo fork's standalone CSS file path", but maybe that's being too precious. I can appreciate wanting to make this kind of change without a full build cycle.
You should be able to have the user look like something-- both an avatar and a customizable display name, and probably a bio? -- on Mastodon