clj-commons / hickory

HTML as data
Other
635 stars 52 forks source link

Parsing throws on long markdown files #45

Open ghost opened 7 years ago

ghost commented 7 years ago

I'll be updating this issue as I tease it out, but I think something in the following chain of functions fails when the source markdown is too long:

  1. md-to-html-string
  2. parse
  3. as-hiccup

Trial code

(ns foo.data
  (:require [clojure.java.io :as io]
            [hickory.core :refer [as-hiccup parse]]
            [markdown.core :refer [md-to-html-string]]))

(defmacro raw-foo-html []
  (md-to-html-string (slurp (io/resource "foo.md"))))

(defmacro foo-data []
  (vec (as-hiccup (parse (raw-foo-html)))))

(defmacro foo-body []
  (rest (rest (nth (first (foo-data)) 3))))

Log

lein do clean, figwheel
Figwheel: Cutting some fruit, just a sec ...
Figwheel: Validating the configuration found in project.clj
Figwheel: Configuration Valid :)
Figwheel: Starting server at http://0.0.0.0:3449
Figwheel: Watching build - app
Figwheel: Cleaning build - app
Compiling "target/cljsbuild/public/js/app.js" from ["src/cljs" "src/cljc" "env/dev/cljs"]...
Failed to compile "target/cljsbuild/public/js/app.js" in 15.751 seconds.
----  Could not Analyze  src/cljs/foo/core.cljs  ----

  java.lang.ClassFormatError: Unknown constant tag 99 in class file foo/data$foo_data, compiling:(foo/data.clj:10:1)

----  Analysis Error : Please see src/cljs/foo/core.cljs  ----
---- Initial Figwheel ClojureScript Compilation Failed ----
We need a successful initial build for Figwheel to connect correctly.

I noticed it when I tried to double the length of an existing markdown file that was parsing successfully. If I take out a random section from the first half of the file, the parsing works again.

I can't show the exact document I found this in, but I'll try to duplicate it with lorem ipsum.

ghost commented 7 years ago

If I generate 9461 words with lipsum, parsing succeeds.

On the other hand, if I add one more paragraph, increasing the count to 9589 words, parsing fails.

Here is a gist you can use to verify it. Remove the last paragraph to get parsing to succeed. https://gist.github.com/bright-star/5f9c04b8f6816552adb0d6d517e74036