Bossett / bsky-feeds

MIT License
42 stars 33 forks source link

Feeds are only showing posts from 3 days ago #151

Closed MikeHuntington closed 3 hours ago

MikeHuntington commented 5 hours ago

When running the sample algos like "auspol" or "cats", The feeds only show posts from 3 days ago. Is there a configuration that I could be setting incorrectly to cause that?

Bossett commented 4 hours ago

No that’s correct - the engine only looks back as far as the firehose backfill goes (3 days) on first launch - but will store history going forward (up to a limit for some feeds, specified in https://github.com/Bossett/bsky-feeds/blob/ae5dae7d38bf816b897c071e1c0af8702718bb6a/src/algos/cats.ts#L47 e.g.)

MikeHuntington commented 4 hours ago

@Bossett thanks so much for the answer, that's makes sense.

However, after the backfill is read the first time, Is there a reason the feed still only updates only showing posts from 3 days ago even after a few hours of being published?

Here's an example of the feed I published: Cats Feed

MikeHuntington commented 4 hours ago

Oh Wait, I think I understand, you mean it will just take some time for all of the feeds (from 3 days ago) to reach present posts?

Bossett commented 4 hours ago

yes that's right - just make sure that it's able to write to the sub_state collection (https://github.com/Bossett/bsky-feeds/blob/ae5dae7d38bf816b897c071e1c0af8702718bb6a/src/db/dbClient.ts#L163) which lets it keep track of where it's up to (you can use mongo's compass tool to check & create if necessary)

MikeHuntington commented 4 hours ago

Awesome, I think that solves my issue, but one last observation... with the delay it takes for the posts to be processed for the feed, it seems like it would never catch up to current posts.

Is there a way for me to make that processing happen a bit faster? (As you are probably aware, traffic on BSky is high and the number of posts coming through is crazy high)

Bossett commented 3 hours ago

It'll move as fast as your bandwidth/cpu allows and it skips all non-posts so it will generally be 'fast enough'. It is inefficient though since it was built before tools like jetstream (https://github.com/bluesky-social/jetstream) were available.

I run it in production now though - and it is keeping up at ~25mbit - noting that right now there a big US east coast outage unrelated to bsky that's breaking things

MikeHuntington commented 3 hours ago

Your help is much appreciated! Thank you!