drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
Apache License 2.0
315 stars 32 forks source link

cmark links should be handled #44

Open drahnr opened 4 years ago

drahnr commented 4 years ago

Summary

There are multiple variants here

[https://ahoi.io](https://ahoi.io)
[fun](../fun.html)
[struct Foo](super::Foo)
drahnr commented 4 years ago

There is already something similar for mdbook https://github.com/Michael-F-Bryan/mdbook-linkcheck

laysauchoa commented 3 years ago

cc @drahnr I guess this still TODO, right? Would be fine if I work on this? Thanks

drahnr commented 3 years ago

Sure @laysauchoa feel free!

drahnr commented 3 years ago

Is there anything that's blocking you and that I could help with?

laysauchoa commented 3 years ago

@drahnr sorry for the delay, it was just really full those last days. I aim to start this week+weekend.

drahnr commented 3 years ago

No rush, it's a pet project and should be a fun and learning experience, just making sure there are no road blocks :)

laysauchoa commented 3 years ago

@drahnr thanks, you are great!

laysauchoa commented 3 years ago

CC @drahnr I have some questions so I will do it as follow the description:

There are multiple variants here https://ahoi.io fun struct Foo detect valid urls in [] so spell checking can be ignored in that span https://ahoi.io

What does valid means here?

  1. to check if the string inside the [] is a URL string such as https://ahoi.io
  2. to check if this website is reachable

    links to struct items should be ignored in spell checks

I have not yet found any reference for scanning struct and validating them. Only for local files and web links, should this be checked if it a struct present in the file or just any struct should be ignored?

add a flag to verify the linked to destinations exist can you write an example of what those flags would look like or what can I check about it?

By the way, seems like doing it with regex is not that simple as I thought. Is it using external crates something to be thought here or should regex be used? Or maybe you meant to use mdbook for that. or linkify.

Please, share your thoughts about it. Thank you!

drahnr commented 3 years ago

CC @drahnr

I am automatically notified if I am subscribed (which I am always for this repo :) ) - feel free to CC me explicitly on third party repos which are relevant!

There are multiple variants here https://ahoi.io fun struct Foo detect valid urls in [] so spell checking can be ignored in that span https://ahoi.io

What does valid means here?

1. to check if the string inside the `[]` is a URL string such as https://ahoi.io

2. to check if this website is reachable

I think we should carve out two definitions:

  1. valid as Url::parse(..) returns an Ok(_).
  2. reachable see the next points :)

Now we should try to check if the link text is an url too, so 1. applies. Example:

[http://ahoi.io](http://ahoi.io/sub/topic) would succeed for the first check, it's a valid url and should thus not be spell checked.

Number two should be only done explicitly on user request, and would fail in this case - it does not exists, further details below.

links to struct items should be ignored in spell checks

I have not yet found any reference for scanning struct and validating them. Only for local files and web links, should this be checked if it a struct present in the file or just any struct should be ignored?

In rust doc comments, you can specify a link like this [yada yada](crate::module::x) which can not be checked easily, it requires resolving all modules and their paths which is a bit out of scope for cargo-spellcheck at this point.

add a flag to verify the linked to destinations exist can you write an example of what those flags would look like or what can I check about it?

By the way, seems like doing it with regex is not that simple as I thought. Is it using external crates something to be thought here or should regex be used? Or maybe you meant to use mdbook for that. or linkify.

My usual brevity.. apologies, verify the link destination exists means the querying it with GET request must return a 200 or 30x (permanently moved or such, which is a common thing to forward i.e. http -> https). But that would require network access and hence is not really possible in continuous integration setups. So I would like this to be only enabled explicitly via commandline flag.

If there is still something fuzzy, feel free to ask me to elaborate so you get a concise picture :)

drahnr commented 3 years ago

I guess it makes sense for you to land this as part of hacktober?

drahnr commented 3 years ago

For md-book there is a plugin that works rather well, which could be taken as an inspiration - https://github.com/Michael-F-Bryan/mdbook-linkcheck

At this point I think it's mostly doing and deciding which way to go for: Either with or without lychee, rebasing that branch today exposed some significant compilation time increment even on a strong~ish machine.