drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar
Apache License 2.0
326 stars 33 forks source link

Detect padding in doc comments #204

Open pjsier opened 3 years ago

pjsier commented 3 years ago

What does this PR accomplish?

Closes #133.

Changes proposed by this PR:

Detect padding in doc comments with CommentPaddingStyle enum

Notes to reviewer:

Adds enum and detect_padding method to parse and store padding style

📜 Checklist

pjsier commented 3 years ago

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

drahnr commented 3 years ago

Thanks for the feedback on this! I updated the branch with your comments, should this be incorporated into the trim_span method or should it only detect the padding style for now and leave the padding inside the TrimmedLiteral?

I think, removing it at the span/syn/ra_ap_syntax stage will simplify later stages significantly (or: avoid additional code changes), so I'd recommend having a clean TrimmedLiteral that has a representation without the leading \s*\*\s.

Note: There is a caveat that a few consecutive \s\*\s could also be a markdown lists if they are not continuous for each line :)

pjsier commented 3 years ago

Great, thanks! Should this be in or before the trim_span method then? I know you had mentioned moving the padding detection after, but it seems like we would need ti there.

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

drahnr commented 3 years ago

Also, I think we'll need to bump the version of fancy-regex to take advantage of the replace_all method for parsing this if that's alright

Whereever you see fit :)

Bump deps as needed.

pjsier commented 3 years ago

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

drahnr commented 3 years ago

I made some updates to reflect the recent feedback but it looks like I'm a bit stuck again. rendered currently includes some content that is ignored like the prefix and suffix, so I initially tried to maintain this and only store metadata about the padding without modifying the content of rendered directly. I ran into issues because the replacement logic for as_str became more complicated than the offsets from prefix and postfix currently used, and it seemed like it would require returning a String instead of &str

I think you have two options: Either replace as_str() -> &'_ str into a as_str_set() -> &[&'_ str] as well as impl ToString and deal with the fallout over the code base. Note that this will not avoid the allocation, since the set creation will itself require an allocation. While the above is possible, I think it's much easier to split a TrimmedLiteral into multiple and adjust the Cluster (if needed, I did not dig into the details). This would essentially avoid the whole ordeal of handling paddings in the fn render(), but re-use existing logic of TrimmedLiteral clustering, where padding could be just a another delimiter as initially discussed. It's super important though to get the offsets right (with emojis and multi-width characters) and have sufficient test cases, any issue here will cause wrong spans being passed throughout the pipeline and that's not-funâ„¢ to debug.

Should rendered include the padding strings, and if so should a separate field be added to TrimmedLiteral that stores the content with padding strings removed? Thanks again!

rendered should definitely not contain those. It was always meant to be the content representation that is presented to the next stage in the documentation generation pipeline (here: parsing with markdown/cmark) and hence must be cleared from any control/style chars that are only relevant to the rust documentation annotation.


Just explaining this to you makes it clear that there is a lot of documentation missing and quite a few design joices are not obvious, and naming of things could be improved by quite a bit. Very much appreciate your effort!

@pjsier let me know if there is anything else I can do to aid you driving this to completion :)

drahnr commented 3 years ago

@pjsier gentle ping :)

pjsier commented 3 years ago

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

drahnr commented 3 years ago

@drahnr thanks for following up on this! I haven't had as much time as I would have liked, so if I'm holding anything up feel free to take over and thanks for all the input

I think it's better if you push it over the finish line :) there is no particular rush here - take your time!