Closed Nutomic closed 1 month ago
would there be a downside to running the scheduled task every minute or so for better scheduling accuracy? if this is kept at hourly it should probably not be hourly since service startup but at every x:00 every hour for consistency, plus once at startup.
The most frequent scheduled task we have now is every 15 min for hot rank updates, so I could move it in there. Making it more frequent may have too much performance impact. And I believe the task runs at every full hour, unrelated to startup time.
it would probably also be useful for admins (ideally also mods, though less useful due to lack of federation until publishing date) to see a queue of scheduled posts to potentially moderate before sending out a wave of spam at a later time.
Instead of this, how about we limit the number of scheduled posts by a single user? We could also set a time limit, eg at most one week in the future.
Instead of this, how about we limit the number of scheduled posts by a single user?
There's no reason to do that imo, I don't think its going to be an expensive query. And we can add an index on that column if necessary.
We could also set a time limit, eg at most one week in the future.
The check_expire_time function has a 10 year in the future limit 😂, but we could do something shorter. I'd want it to be at minimum a year out tho, because you might want to schedule a post for a movie release date or something, and those can be a way in the future.
One other thing I didn't think of, is that we might need to add this column to the post aggregates table, because its used in post_view, and it does most of its complicated joins to post aggregates for speed.
cc @dullbananas @phiresky
There's no reason to do that imo, I don't think its going to be an expensive query. And we can add an index on that column if necessary.
I was thinking to prevent abuse, otherwise a spammer can schedule 100 posts to be created at the same time, bypassing the rate limits.
I was thinking to prevent abuse, otherwise a spammer can schedule 100 posts to be created at the same time, bypassing the rate limits.
I wouldn't worry about that for now, 1) because they still going to get hit with the rate limit on creating the publishable posts, so its going to be tedious unless they automate it, and 2) because a ban and remove will still get rid of all those.
We can always add something later there if it becomes a problem.
I'd probably prefer the None for scheduled_publish_time to mean published, since we already have published and updated timestamp columns on that table.
Perhaps a domain specific enum would be more appropriate? e.g. something like:
enum PublishTime {
Published,
Future(i64)
}
Published would be for immediately published posts or scheduled posts that were already published, whereas future would be published in the future. Not sure how this would work as a DB column though.
Seems fine to me. I'd probably prefer the None for scheduled_publish_time to mean published, since we already have published and updated timestamp columns on that table.
None means publish immediately so its already to your preference (unless I misunderstand you).
Perhaps a domain specific enum would be more appropriate?
That enum has the exact same info as an option, and makes the api unnecessarily complicated.
There needs to be abuse mitigation before this gets merged, otherwise it would be easy to create an annoying amount of posts with the same exact scheduled publish time if it's far enough in the future. An acceptable solution would be to have a hashmap in the scheduled task that keeps track of the publish timestamp of each creator's most recent post, and skip publishing a scheduled post if the age of the previous post is less than the duration of the post rate limit. The hash map will need to be updated in the loop so that posts that were scheduled by the same creator and handled in the same loop aren't ignored.
@dullbananas That sounds too complicated, but Ive added a check so normal users can only schedule up to 10 posts.
@dullbananas That sounds too complicated, but Ive added a check so normal users can only schedule up to 10 posts.
That sounds good enough for this PR. Later it would be nice to have something that allows scheduling more posts.
would there be a downside to running the scheduled task every minute or so for better scheduling accuracy? if this is kept at hourly it should probably not be hourly since service startup but at every x:00 every hour for consistency, plus once at startup.
it would probably also be useful for admins (ideally also mods, though less useful due to lack of federation until publishing date) to see a queue of scheduled posts to potentially moderate before sending out a wave of spam at a later time.