alshedivat / al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics
https://alshedivat.github.io/al-folio/
MIT License
11.34k stars 11.28k forks source link

Posts Scheduler #2659

Closed KingHowler closed 2 months ago

KingHowler commented 3 months ago

Posts Scheduler

GitHub Workflow

Purpose

The workflow uploads posts that are scheduled for a later date automatically. The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

Error ?

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

How I found this out: I was making a post but no matter what I did it just wouldn't show up at my website. After starting from scratching and changing everything one-at-a-time to see where the code breaks I found out that the date was causing the post to not be uploaded, the filename was 2024-08-25-lecture1.md and the day I was uploading this post was also 25-August-2024. When I tried to upload it after half a day it was allowed to be uploaded.

After comparing times I found out that at the time of the failed uploading the date was 25-Aug in my region and 24-Aug at UTC. Hence I concluded that only posts which are dated for the present or the past is uploaded on the website

Another thing to notice:

The website is only uploaded when the Deploy workflow is run. This means that the user must be online and manually deploy the website.

Quick Fix

You can solve this issue by having the deploy function run every day. The issue with that is the commit history for gh-pages branch. Deploying every day will make the commit history very annoyingly big.

Better Solution

In the main branch, add a new folder _scheduled and make posts for a later date in there. Then using the workflow I made, the posts should automatically deployed on the date in their names. If there are no files scheduled for today, then there would be no changes made hence keeping the commit history minimal.

george-gca commented 3 months ago

The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

I agree this would be useful.

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

Actually this is a static sites generator (jekyll in this case) thing. Someone even opened an issue in jekyll a long time ago, but summarizing: static sites generators build a website and then it is done. It can't change a site after it has been built. Since the blog is in a future date, it just ignores it during build.

Now, how would work your solution if a post has a timestamp for later in the day? For example, the post name has 2024-08-27, but inside the post has something like date: 2024-08-27 15:09:00? I believe when your cron job runs it will move the posts correctly to the _posts/ dir, but during build these posts will be ignored, and will only be included in the next build.

KingHowler commented 3 months ago

The ability to schedule posts is a very important tool and I feel like it should also be included in this repository.

I agree this would be useful.

I don't know for certain if this is an error or if it is intentional but al-folio doesn't publish posts which are dated on the future.'

Actually this is a static sites generator (jekyll in this case) thing. Someone even opened an issue in jekyll a long time ago, but summarizing: static sites generators build a website and then it is done. It can't change a site after it has been built. Since the blog is in a future date, it just ignores it during build.

Now, how would work your solution if a post has a timestamp for later in the day? For example, the post name has 2024-08-27, but inside the post has something like date: 2024-08-27 15:09:00? I believe when your cron job runs it will move the posts correctly to the _posts/ dir, but during build these posts will be ignored, and will only be included in the next build.

That is certainly an interesting case, but we can't run deploy once every hour, it will clutter up the actions tab too much.

This can be solved with a bit of sacrifice. Instead of running the scheduler at 00:00 we can run it at 23:59. This way all of the posts will be deployed. On that specific day but only right before midnight which I assume is not ideal.

Another option is to use delays but that will cause the workflow to run for potentially upto 7-8 hours and that will cause problems for GitHub itself

A 3rd option which I think may have a good chance of working is by separating this workflow into 2

george-gca commented 3 months ago

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

KingHowler commented 3 months ago

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

Yeah but that only helps with comparison. The difficult part is to trigger the event as minimally as possible.

I've got an idea but I am a bit busy for a few days. I'll send a diagram of a possible solution this Thursday.

george-gca commented 3 months ago

I believe the easiest solution would be to trigger the action twice a day (00:00 and 23:59), and check each time which posts based on the timestamp should be moved to _posts/.

KingHowler commented 3 months ago

I believe the easiest solution would be to trigger the action twice a day (00:00 and 23:59), and check each time which posts based on the timestamp should be moved to _posts/.

I've got a more complex but feasible idea for that. here is a flowchart of it

Post-Scheduler-Proposal-Flowchart

CheariX commented 3 months ago

Maybe we could also fetch the timestamp from the post itself and use it somehow? For example, by doing:

sed -ne '/---/,/---/{/---/N;p}' 2015-03-15-formatting-and-links.md | grep "date: " | cut -c7-

We can obtain the timestamp 2015-03-15 16:40:16 from inside the post.

Just my few cents: instead of grep + cut, one could also use awk '/^date:/ {print $2}' to print the date only (If I understood correctly, the time is not necessary. Otherwiese print $2, $3). This is maybe less error-prone in case there are simple formatting issues, like multiple whitespaces after the colon.

george-gca commented 3 months ago

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

KingHowler commented 3 months ago

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

Well yeah, it does seem excessive but that's only to ensure that each post is posted on the exact time it was said to be on.

Other than that we can just run it once every day at 23:59. I'm fine with either way as I don't use time in my website, only date.

Running it once will save us from the pain of modifying the workflow and extracting timestamps.

KingHowler commented 3 months ago

@CheariX the time is necessary.

@KingHowler then the action itself would change its own cron schedule? It is feasible, but I think it is kind of overcharged. For example, if someone adds like 5 scheduled posts for the same day, but for different timestamps, the action would run 5 times, and it seems a bit excessive. What do you think?

Another thing I would like to mention is that this highly depends on the user. Like I said I don't use time but there may be some people who think it's crucial for their post to be published at the exact time.

How about we make a config.txt file?

We can build both versions. An action that runs daily and an action that runs at specific times. A third action, the scanner, will check the config.txt and run the appropriate action according to it.

This makes it customizable for the user.

george-gca commented 3 months ago

I am more inclined into doing a single one that checks everyday (ignoring time) and add an explanation of how one would do that considering the timestamp in our FAQ or CUSTOMIZE. What do you think?

KingHowler commented 3 months ago

I am more inclined into doing a single one that checks everyday (ignoring time) and add an explanation of how one would do that considering the timestamp in our FAQ or CUSTOMIZE. What do you think?

The current code can fulfill those requirements, but as the site doesn't publish posts dated for a later time, it's essential to have the scheduler work exactly at 23:59.

Earlier you proposed that we should run it at 00:00 and 23:59. If we run it at 00:00, all files dated (ignoring time) will be sent to _post/ but deploy won't upload them. When we run it at 23:59 later, all the files will already have been moved to _posts/ and there will be nothing to push.

Without pushing the deploy action won't trigger and we will have to wait for the next scheduled post (that could be the next day, could be next month maybe even next year)

So running it twice a day will cause the scheduler to not work properly.

george-gca commented 3 months ago

The current code can fulfill those requirements, but as the site doesn't publish posts dated for a later time, it's essential to have the scheduler work exactly at 23:59.

We can simply ignore the time like you currently use, and give instructions for adding the time if someone wishes it to.

Earlier you proposed that we should run it at 00:00 and 23:59. If we run it at 00:00, all files dated (ignoring time) will be sent to _post/ but deploy won't upload them. When we run it at 23:59 later, all the files will already have been moved to _posts/ and there will be nothing to push.

My mistake, I meant 12:00 and 23:59. That way we would have maybe 2 pushes per day. But it is ok for me doing once a day and gathering all posts for that day.

george-gca commented 3 months ago

I believe I can merge this PR now. Would you mind sending another PR with the information about how one could add support for time in this action @KingHowler, maybe in CUSTOMIZE.md or FAQ.md? I believe it would be useful for some users.

KingHowler commented 3 months ago

I believe I can merge this PR now. Would you mind sending another PR with the information about how one could add support for time in this action @KingHowler, maybe in CUSTOMIZE.md or FAQ.md? I believe it would be useful for some users.

Wait, let me set trigger time to 23:59 first

KingHowler commented 3 months ago

@george-gca I have set the trigger time to 23:59 and will send a PR for CUSTOMIZE.md

Here's a few things I'd like you to know before you merge this

george-gca commented 3 months ago

I just added a few more things to fix.