ScholliYT / Broken-Links-Crawler-Action

GitHub Action to check a website for broken links
MIT License
22 stars 3 forks source link

[Feature request] Publish as a python package on PyPI #14

Open aceberle opened 3 years ago

aceberle commented 3 years ago

I'm still catching up on how the python world works, but with the recent changes made to deadseeker I think it would be relatively easy to publish it as a python package on PyPI so that it can be installed as a standalone Python CLI utility.

I think most of the work would involve creating the recommended project files and possibly pulling in some sort of command-line argument parsing utility to allow the inputs to be provided via command line instead of environment variables (although as a first pass at this, environment variables would work fine too).

But I'm thinking something like this:

>pip install deadseeker
>python -m deadseeker --includeprefix="https://a.com/" --always-get-onsite https://a.com/

Use case

I am using cloudflare in front of my static site to help with edge caching to speed the site response time up. When I publish changes to the site, I have a workflow set up that uses another action to send a cache purge request to cloudflare and then I am using broken-links-crawler-action to crawl the site to repopulate the cache. So far as I can tell, this works perfectly now.

My client-side caching headers specify that the pages should only be cached for an hour, and I was considering setting up a periodic process (cron job) on my webserver to re-crawl the site every hour so that the cache refresh hit is less likely to impact an actual user and I figured that the logic in broken-links-crawler does this quite well. If the logic was also provided as a python package then I could install it on the webserver and run it via cron.

I know that this can be easily done by just cloning the project onto my webserver and running the code directly from there (and I might just do that today to solve my use-case for now), but I thought it might make it easier for other people if it was a published package. I may just be looking for a reason to publish a package on PyPI just to know how to do it, so feel free to ignore this request if it is so far outside the purpose of the project.

ScholliYT commented 3 years ago

Good idea @aceberle. I already though about separating the core logic from the GitHub action itself. This would also allow to publish the broken links crawler as standalone docker image with clean Environment variable input (instead of the INPUT_ required by GitHub Actions). This docker image could also solve your use case (although a bit overkill compared to a cron job) and many other like using the crawler in a GitLab pipeline.

I will have a look on how to build a python package and publish it to pypi because I neither did that before.