crispinb / dash-clojuredocs

Tool for generating ClojureDocs docset for Dash
0 stars 0 forks source link

"Invalid number" exception during crawl #6

Open crispinb opened 4 years ago

crispinb commented 4 years ago
20-08-24 04:45:11 bamboo INFO [pegasus.queue:59] - :obtained http://clojuredocs.org/clojure.test.junit/suite-attrs
20-08-24 04:45:11 bamboo INFO [pegasus.defaults:253] - :num-visited 1394
20-08-24 04:45:11 bamboo INFO [pegasus.defaults:237] - :number-enqueued 1459
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:253] - :num-visited 1395
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:262] - :stopping-crawl!
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:267] - :stop-items 2
20-08-24 04:45:12 bamboo INFO [pegasus.queue:51] - :default-delay 100
20-08-24 04:45:12 bamboo INFO [pegasus.queue:55] - :clojuredocs.org
20-08-24 04:45:12 bamboo INFO [pegasus.queue:59] - :obtained http://clojuredocs.org/clojure.test.junit/start-suite
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:253] - :num-visited 1396
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:262] - :stopping-crawl!
20-08-24 04:45:12 bamboo INFO [pegasus.defaults:267] - :stop-items 2
Exception in thread "main" java.lang.NumberFormatException: Invalid number: 5d221680e4b0ca44402ef77e
        at clojure.lang.LispReader.readNumber(LispReader.java:330)
        at clojure.lang.LispReader.read(LispReader.java:256)
        at clojure.lang.LispReader.readDelimitedList(LispReader.java:1200)
        at clojure.lang.LispReader$MapReader.invoke(LispReader.java:1158)
        at clojure.lang.LispReader.read(LispReader.java:263)
        at clojure.lang.LispReader.readDelimitedList(LispReader.java:1200)
        at clojure.lang.LispReader$VectorReader.invoke(LispReader.java:1150)
        at clojure.lang.LispReader.read(LispReader.java:263)
        at clojure.lang.LispReader.readDelimitedList(LispReader.java:1200)
        at clojure.lang.LispReader$MapReader.invoke(LispReader.java:1158)
        at clojure.lang.LispReader.read(LispReader.java:263)
        at clojure.lang.LispReader.read(LispReader.java:196)
        at clojure.lang.LispReader.read(LispReader.java:185)
        at clojure.lang.RT.readString(RT.java:1835)
        at clojure.lang.RT.readString(RT.java:1830)
        at clojure.core$read_string.invokeStatic(core.clj:3687)
        at clojure.core$read_string.invoke(core.clj:3677)
        at dash_clojuredocs.core$map_source.invokeStatic(core.clj:103)
        at dash_clojuredocs.core$map_source.invoke(core.clj:101)
        at dash_clojuredocs.core$handle_source.invokeStatic(core.clj:140)
        at dash_clojuredocs.core$handle_source.invoke(core.clj:129)
        at dash_clojuredocs.core$crawl_clojuredocs.invokeStatic(core.clj:211)
        at dash_clojuredocs.core$crawl_clojuredocs.invoke(core.clj:196)
        at dash_clojuredocs.core$_main.invokeStatic(core.clj:215)
        at dash_clojuredocs.core$_main.invoke(core.clj:213)
        at clojure.lang.Var.invoke(Var.java:375)
        at clojure.lang.AFn.applyToHelper(AFn.java:152)
        at clojure.lang.Var.applyTo(Var.java:700)
        at clojure.core$apply.invokeStatic(core.clj:646)
        at clojure.main$main_opt.invokeStatic(main.clj:314)
        at clojure.main$main_opt.invoke(main.clj:310)
        at clojure.main$main.invokeStatic(main.clj:421)
        at clojure.main$main.doInvoke(main.clj:384)
        at clojure.lang.RestFn.invoke(RestFn.java:421)
        at clojure.lang.Var.invoke(Var.java:383)
        at clojure.lang.AFn.applyToHelper(AFn.java:156)
        at clojure.lang.Var.applyTo(Var.java:700)
        at clojure.main.main(main.java:37)
crispinb commented 4 years ago

(map-source) uses regexes to unravel some of the escaping done by pegasus when cramming the scraped html into an edn corpus. This is fragile and any change can result with read-string barfing on unbalanced forms.

I can't make head nor tail of it