Describe the bug
A link with a space in the filename, such as <a href="test file.html"> is parsed as pointing to test. A link with an escaped space is parsed as the full filename, but the escape is not converted back to a space so it expects test%20file.html to exist on disk.
To Reproduce
mlc/jms_test % ls
test.html testing 2.html
mlc/jms_test % cat test.html
<html>
<body>
<a href="testing 2.html">hello</a>
<a href="testing%202.html">hello</a>
</body>
</html>
mlc/jms_test % mlc
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.16.1 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Err ] ./test.html (4, 3) => testing%202.html - Target filename not found.
[Err ] ./test.html (3, 3) => testing - Target not found.
Result (2 links):
OK 0
Skipped 0
Warnings 0
Errors 2
The following links could not be resolved:
./test.html (4, 3) => testing%202.html
./test.html (3, 3) => testing
Expected behavior
Spaces in filenames are parsed as part of lines
and/or %20 in filenames are converted to spaces before checking local filenames (and maybe other escapes too)
Desktop (please complete the following information):
OS: macOS
Browser N/A
Version N/A
Additional context
I added the following test to validate my theory, but my rust is not good enough to fix the bug :(
diff --git a/src/link_extractors/html_link_extractor.rs b/src/link_extractors/html_link_extractor.rs
index f475514..1d83f8b 100644
--- a/src/link_extractors/html_link_extractor.rs
+++ b/src/link_extractors/html_link_extractor.rs
@@ -125,6 +125,20 @@ mod tests {
assert!(result.is_empty());
}
+ #[test]
+ fn space() {
+ let le = HtmlLinkExtractor();
+ let input = "blah <a href=\"some file.html\">foo</a>.";
+ let result = le.find_links(input);
+ let expected = MarkupLink {
+ target: "some file.html".to_string(),
+ line: 1,
+ column: 6,
+ source: "".to_string(),
+ };
+ assert_eq!(vec![expected], result);
+ }
+
Describe the bug A link with a space in the filename, such as
<a href="test file.html">
is parsed as pointing totest
. A link with an escaped space is parsed as the full filename, but the escape is not converted back to a space so it expectstest%20file.html
to exist on disk.To Reproduce
Expected behavior
Desktop (please complete the following information):
Additional context I added the following test to validate my theory, but my rust is not good enough to fix the bug :(