This is a WIP but it should fundamentally work. I haven't tested anything tho, I need to get back to writing something due really soon... 😅
Remaining issues:
Ephemeral storage is going to be an issue given this uses sqlite3. I have a few different ideas:
Use some sort of object storage (e.g. S3) and send the Sqlite3 database back and forth over it (assuming Heroku even uses us let Sqlite as a dependency; it really doesn't like it) between process boots.
Replace Sqlite3 with just flatfile storage — given the database is mainly just used to build the corpora and it doesn't seem to do much with any column other than the content one, it might be easier to just ignore all the metadata entirely and dump all the content into a newline-delimited text file. Probably a bit more performant too. Could ostensibly use CSV if data like toot ID is necessary.
Use Postgres when deployed on Heroku. Syntax is pretty similar. We could use an ORM like peewee or pony to prevent having to do raw SQL calls in code and be able to use the same code to talk to either Python or Sqlite. Probably the easiest to provision. This increasingly feels like the way to go to add PaaS support while maintaining backwards compatibility for people who run things locally.
Getting the access token will be different than the existing use case; the whole two-legged OAuth dance won't really work unless we explicitly set up a webserver that gives users their access token, which they then paste into Heroku's config. Actually this is pretty straight-forward now that I think of it, I'll just host a webserver for this purpose somewhere.
Uhhhh really sorry I have pylint set to autoformat to PEP8 and so this diff is really hard to read 😓
by default, if you don't provide an oauth URL, mastodon will just give you the code instead c:
thanks so much for all this! i'll have a look at it in a bit!
This is a WIP but it should fundamentally work. I haven't tested anything tho, I need to get back to writing something due really soon... 😅
Remaining issues:
Use some sort of object storage (e.g. S3) and send the Sqlite3 database back and forth over it (assuming Heroku even uses us let Sqlite as a dependency; it really doesn't like it) between process boots.
Replace Sqlite3 with just flatfile storage — given the database is mainly just used to build the corpora and it doesn't seem to do much with any column other than the content one, it might be easier to just ignore all the metadata entirely and dump all the content into a newline-delimited text file. Probably a bit more performant too. Could ostensibly use CSV if data like toot ID is necessary.
Use Postgres when deployed on Heroku. Syntax is pretty similar. We could use an ORM like peewee or pony to prevent having to do raw SQL calls in code and be able to use the same code to talk to either Python or Sqlite. Probably the easiest to provision. This increasingly feels like the way to go to add PaaS support while maintaining backwards compatibility for people who run things locally.
Uhhhh really sorry I have pylint set to autoformat to PEP8 and so this diff is really hard to read 😓