becheran / mlc

Check for broken links in markup files
MIT License
129 stars 17 forks source link

fix: Include untracked files with --gitignore option #96

Open willcl-ark opened 1 month ago

willcl-ark commented 1 month ago

The current --gitignore option correctly ignores files which are "ignored" by git, but the --ignore option to git ls-files does not include untracked files in it's output.

These can be detected using git ls-files --others --exclude-standard.

Combine the two calls into a single deduplicated gitignore list to ignore all files properly.

becheran commented 1 month ago

Thanks for the PR. I am a bit hesitated to directly merge it. Technicaly it looks all fine. But I wonder if it wouldn't overcomplicate things? Is there a real world use case / need for this? Untracked files in a git repo sooner or later are committed, deleted or added to gitignore anyways or not?

willcl-ark commented 1 month ago

Fair question! :)

We had an issue in https://github.com/bitcoin/bitcoin/issues/30496 where a dirty cache-hit was sometimes restoring a .pyenv/README.md file, which included a broken markdown link, causing CI failure.

Of course, this should not really happen, and we fixed this already by using out of tree python builds, but IMO it makes sense for a --gitignore option to ignore all files not tracked and ignored by git, I think?

becheran commented 1 month ago

Hm... got it. Still not 100% convinced. I can already think of others having the exact oposite use case. For example they generate md or html files in the CI pipeline and want to check the output with mlc. I guess they would then be confused when mlc would only check files which are staged.

willcl-ark commented 1 month ago

Might it work better for you if I implemented a new, separate --git-untracked flag (or similar), so that the two were independent?

becheran commented 2 weeks ago

Maybe even '--gitignoreuntracked' or the oposite such as --gitracked which then only chekcks tracked links would (maybe) make more sense for naming the flag?

If you really see a need in having this I would be OK and merge it if you put the logic behind a separte flag without changing the original --gitignore flag.