Digital-Forensics-Discord-Server / TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts

The official repo for a project involving a crowdsourced DFIR book. The main purpose of this book is to give anyone interested an opportunity to write a chapter of a book to get their name out there, get a publication on their resume with an actual ISBN number, and ideally lower the bar for people to contribute something back to the DFIR Community. Want to write a chapter? Let me know and let's make it happen!
MIT License
192 stars 22 forks source link

Dead link checker #59

Closed brootware closed 1 year ago

brootware commented 2 years ago

Hey folks, I just took a look at the book repo and I see there tends to be links inside these. As we contribute more contents, more links are going to come in. There’s one technique called markdown link checker with GitHub actions I employed in my one of my own repos to identify dead links. Please let me know if you think this is a good idea

AndrewRathbun commented 2 years ago

Yes this is a great idea. Can you tell me how to implement this or point me to a repo that has it implemented as an example? GitHub actions are something I'm wanting to learn and this looks like a good one to learn on.

brootware commented 2 years ago

Hey @AndrewRathbun , sure I have implemented this in my fork of this repo. You can take a look at the results here. There are a couple of dead-links found.

https://github.com/brootware/CrowdsourcedDFIRBook/runs/6344168741?check_suite_focus=true

Some are 503 like below where the endpoint is not available.

ERROR: 1 dead links found!
[155](https://github.com/brootware/CrowdsourcedDFIRBook/runs/6344168741?check_suite_focus=true#step:4:155)
[✖] https://www.exterro.com/ftk-imager#:~:text=FTK%C2%AE%20Imager%20is%20a,(FTK%C2%AE)%20is%20warranted. → Status: 503

And some are 0, but we're still able to access the link.

ERROR: 1 dead links found!
[169](https://github.com/brootware/CrowdsourcedDFIRBook/runs/6344168741?check_suite_focus=true#step:4:169)
[✖] https://developer.android.com/studio/build/configure-app-module → Status: 0

We can put some of the links that returns 0 or 429 or 403 that are still accessible as exception links in a config file like this with GitHub Actions.

stark4n6 commented 2 years ago

Just fixed the dead link, not sure what happened there.

AndrewRathbun commented 2 years ago

@brootware this is really fantastic. Can you simply PR this into the repo and we'll have it working for us once we figure out hte config?

AndrewRathbun commented 2 years ago

https://github.com/brootware/CrowdsourcedDFIRBook/runs/6344168741?check_suite_focus=true#step:4:55 appears to be a false positive, but https://github.com/brootware/CrowdsourcedDFIRBook/runs/6344168741?check_suite_focus=true#step:4:46 appears to be a true positive.

brootware commented 2 years ago

@AndrewRathbun added exceptions for false positives with this PR. https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/pull/60

AndrewRathbun commented 2 years ago

You rule, thank you!

AndrewRathbun commented 2 years ago

https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/runs/6350892948?check_suite_focus=true#step:4:123

@brootware this one appears to be a FP, too. That link is working for me. Any ideas?

brootware commented 2 years ago

@AndrewRathbun Hi Andrew, I have added that particular link as an exception in the ignore pattern with this PR. https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/pull/61

Status 503 usually would have some form of web service running and thus did not include this as an exception in the rule.

AndrewRathbun commented 2 years ago

@AndrewRathbun Hi Andrew, I have added that particular link as an exception in the ignore pattern with this PR. https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/pull/61

Status 503 usually would have some form of web service running and thus did not include this as an exception in the rule.

Thank you very much for your leadership on this. Really appreciate it!

brootware commented 2 years ago

My pleasure @AndrewRathbun . I am looking forward to read this book once it's published too! Will leave this issue open as more content is being added in to check for more false positives and will do PRs as we go along and identify.

AndrewRathbun commented 2 years ago

My pleasure @AndrewRathbun . I am looking forward to read this book once it's published too! Will leave this issue open as more content is being added in to check for more false positives and will do PRs as we go along and identify.

Awesome, really appreciate that support 👍

AndrewRathbun commented 2 years ago

Another false positive, it seems: https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/runs/6633816171?check_suite_focus=true#step:4:75

brootware commented 2 years ago

Added 502 as exception since usually these are for handling temporary server response errors. https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/pull/82

AndrewRathbun commented 2 years ago

@brootware FYI I've added you as a contributor to the book for helping us out with this feature. Thank you for your work on this 👍

brootware commented 2 years ago

Thank you so much @AndrewRathbun . It's a great honour!

AndrewRathbun commented 2 years ago

https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/runs/7610278348?check_suite_focus=true

I moved the .md files to their own folder, so now obviously we have tons of errors. Do you think we should just ignore .md files with this and only focus on .txt since that's what Leanpub actually parses?

brootware commented 2 years ago

Hi @AndrewRathbun , let me look into this. We can't really check the links in .txt files as this particular action only supports .md files.

I'm also thinking of implementing a cron to check the links on every cadance. I'm thinkin of like every 2 days at 11pm to run the check. Please let me know what you think.

https://crontab.guru/#0_23_*/2_*_*

0 23 */2 * *
AndrewRathbun commented 2 years ago

Yeah that seems to be a great idea! Thank you!

brootware commented 2 years ago

It seems like most of the errors are coming from images to dead links. I have added an additional ignore pattern with the latest PR https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/pull/133.

Interesting how the subsequent runs have stopped recognising these images.

brootware commented 2 years ago

Hey @AndrewRathbun, Just got a notification for a dead link from the latest run on my fork https://github.com/brootware/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/runs/8197161545?check_suite_focus=true

The source link is here: https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/blob/1ae3274f340a4affac53560fe586c9da7cd2618c/manuscript/chapterJ.txt#L94

AndrewRathbun commented 2 years ago

Hey @AndrewRathbun, Just got a notification for a dead link from the latest run on my fork https://github.com/brootware/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/runs/8197161545?check_suite_focus=true

The source link is here: https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/blob/1ae3274f340a4affac53560fe586c9da7cd2618c/manuscript/chapterJ.txt#L94

Thank you for the heads up! Thankfully that chapter isn't live yet but I appreciate you flagging that for me! 🙏

AndrewRathbun commented 2 years ago

I'm reaching out to Josh to see if that link can be fixed.

AndrewRathbun commented 2 years ago

Link is working now!

AndrewRathbun commented 2 years ago

We're good now! Dead Link Checker passed with flying colors.

AndrewRathbun commented 1 year ago

@brootware I just added a Spell Checker action. Any chance you can modify the workflow file to ignore anything in .github?

brootware commented 1 year ago

Sure, @AndrewRathbun let me take a look!

brootware commented 1 year ago

ok I've modified the workflow file to only target to find deadlinks inside manuscript/ directory and all the links in the content looks good and alive. https://github.com/brootware/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/actions/runs/5207913950/jobs/9395934942 Would you happen to have any other files or directories you would like to check dead links for? @AndrewRathbun Else, if all's good I'd like to go ahead and open a PR.

On the spell checker, seems like we need to add more words to allow.txt wordlist

AndrewRathbun commented 1 year ago

ok I've modified the workflow file to only target to find deadlinks inside manuscript/ directory and all the links in the content looks good and alive. https://github.com/brootware/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/actions/runs/5207913950/jobs/9395934942 Would you happen to have any other files or directories you would like to check dead links for? @AndrewRathbun Else, if all's good I'd like to go ahead and open a PR.

On the spell checker, seems like we need to add more words to allow.txt wordlist

Thank you for making that change! And yes it's on my to-do list to get a lot of those 800ish words on the allow list.