Improve RSS Feed aggregator

knyghty commented 7 months ago

We should use a simple sqlite database to store stuff.

We should have a command for adding a feed to a channel. Something like:

/add_feed <channel> <feed_name> <feed_url>

Not sure if the converter is needed. Maybe optional list of choices? But only if we need it for the currently planned feeds.

When the run the command, it grabs the latest entry of that feed, and adds it to the database. We store the current datetime so we know to only grab feeds later than this in the future.

Model something like:

class Feed:
    id: int
    name: str
    channel_id: int
    url: str
    time_of_latest_post: datetime

I could be wrong but I don't think there's a reason to store any feed items anywhere.

Then we poll for posts later than time_of_latest_post, stick them in the channel, and update time_of_latest_post.

It's worth noting that the publication date is optional in RSS. Not sure what we can do about this. Maybe we can do something similar to now and check if the posts are in the channel but I don't see a good way of not splurging every ancient post into the channel. Maybe we should just ignore stuff without a date.

jakdevmail commented 7 months ago

I'd like to avoid an additional dependency for the database. I have an idea for a very minimalistic migration system with python's built-in sqlite lib. I don't think we'll be making any major schema changes or anything like that, but it makes your deployment a lot easier.

Did you have something else in mind, or can i start some work in a branch and we'll see where it goes?

I'm not married to the idea, so i'm open to suggestions :)

On the account of possibly missing post dates: I'd just consider such rss feeds to be badly behaved (even though they are following the rss standard). We'll cross that bridge when we come to it and lets - for now - ignore (but log!) those feed items.

knyghty commented 7 months ago

Personally I'm happy to use what's built into python, I'm fine with raw SQL, my main concern is indeed the migrations. As we only have one deploy and that's unlikely to change something minimal could work but I'd be interested to know what it is.

jakdevmail commented 7 months ago

Practically we hold one directory (lets call it "migrations"), which stores python, or sql files. Those files are actual migrations from the last migration upwards.

The files have a prefix which indicates the order.

Note: Anything like this isn't going to have the near magical django migration experience, its pretty bare-bones. Anything else would be overkill i think.

Now, we only have one problem. How does the instance know what migration to run next (or what migration it currently is at).

Sqlite comes to the rescue: The pragma user_version (https://www.sqlite.org/pragma.html#pragma_user_version). Its a single integer which any application can use as it wants. Sqlite only stores it, but uses it for nothing else. Its practically begging to be used for stuff like this. We can just store out current migration index in there.

I don't mind writing sql, and we can just run the migrations on startup everytime -> Making the deployment near effortless. Rollbacks can be implemented in practically the same way - although i would push them off to another time, i don't think we'll be breaking stuff that fast :)

knyghty commented 7 months ago

Seems reasonable. I would hold off for now until I've done some more refactoring though.

django-discord / bot

Improve RSS Feed aggregator #361