becheran / mlc

Check for broken links in markup files
MIT License
129 stars 17 forks source link

Links with spaces in not parsed correctly #74

Closed jamesoff closed 1 year ago

jamesoff commented 1 year ago

Describe the bug A link with a space in the filename, such as <a href="test file.html"> is parsed as pointing to test. A link with an escaped space is parsed as the full filename, but the escape is not converted back to a space so it expects test%20file.html to exist on disk.

To Reproduce

mlc/jms_test % ls
test.html  testing 2.html
mlc/jms_test % cat test.html
<html>
    <body>
        <a href="testing 2.html">hello</a>
        <a href="testing%202.html">hello</a>
    </body>
</html>
mlc/jms_test % mlc

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+                                                          +
+            markup link checker - mlc v0.16.1             +
+                                                          +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[Err ] ./test.html (4, 3) => testing%202.html - Target filename not found.
[Err ] ./test.html (3, 3) => testing - Target not found.

Result (2 links):

OK       0
Skipped  0
Warnings 0
Errors   2

The following links could not be resolved:

./test.html (4, 3) => testing%202.html
./test.html (3, 3) => testing

Expected behavior

Desktop (please complete the following information):

Additional context I added the following test to validate my theory, but my rust is not good enough to fix the bug :(

diff --git a/src/link_extractors/html_link_extractor.rs b/src/link_extractors/html_link_extractor.rs
index f475514..1d83f8b 100644
--- a/src/link_extractors/html_link_extractor.rs
+++ b/src/link_extractors/html_link_extractor.rs
@@ -125,6 +125,20 @@ mod tests {
         assert!(result.is_empty());
     }

+    #[test]
+    fn space() {
+        let le = HtmlLinkExtractor();
+        let input = "blah <a href=\"some file.html\">foo</a>.";
+        let result = le.find_links(input);
+        let expected = MarkupLink {
+            target: "some file.html".to_string(),
+            line: 1,
+            column: 6,
+            source: "".to_string(),
+        };
+        assert_eq!(vec![expected], result);
+    }
+
failures:

---- link_extractors::html_link_extractor::tests::space stdout ----
thread 'link_extractors::html_link_extractor::tests::space' panicked at 'assertion failed: `(left == right)`
  left: `[ => some file.html (line 1, column 6)]`,
 right: `[ => some (line 1, column 6)]`', src/link_extractors/html_link_extractor.rs:139:9
jamesoff commented 1 year ago

Thanks!