Aivean / royalroad-downloader

https://royalroad.com book downloader
MIT License
57 stars 4 forks source link

Downloader fails to download books with chapters containing special characters #7

Closed Nostras closed 6 years ago

Nostras commented 7 years ago

Try to download the book. https://royalroadl.com/fiction/1866/demon-hero-reaper-saviour/chapter/16869/chapter-12-d%C3%A9j%C3%A0-vu I assume it's an encoding oddity, shouldn't be hard to fix.

Nostras commented 7 years ago

To add, Déjà vu is correctly echoed as "d%c3%a9j%c3%a0-vu" but misconverted to "d%25c3%25a9j%25c3%25a0-vu" when downloading. This suggests problems with the encoding of percent signs.

Aivean commented 7 years ago

Thanks for reporting, I'll take a look.

saviola777 commented 6 years ago

I believe this is still an issue:

 $ java -jar /media/shared/source/git/royalroadl-downloader/target/scala-2.11/royalroadl-downloader-assembly-1.2.4.jar
https://royalroadl.com/fiction/629
[…]
downloading: http://royalroadl.com/fiction/629/until-death/chapter/11413/99-this-is-my-fianc%c3%a9e
[…]
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://royalroadl.co
m/fiction/629/until-death/chapter/11413/99-this-is-my-fianc%25c3%25a9e                                                  
        at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:679)
        at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:676)
        at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:628)
        at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:260)
        at net.ruippeixotog.scalascraper.browser.JsoupBrowser.executeRequest(JsoupBrowser.scala:65)
        at net.ruippeixotog.scalascraper.browser.JsoupBrowser$$anonfun$3.apply(JsoupBrowser.scala:76)
        at net.ruippeixotog.scalascraper.browser.JsoupBrowser$$anonfun$3.apply(JsoupBrowser.scala:76)
        at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
        at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:52)
        at net.ruippeixotog.scalascraper.browser.JsoupBrowser.get(JsoupBrowser.scala:33)
        at com.aivean.royalroad.Main$$anonfun$4$$anonfun$apply$1.apply(Main.scala:71)
        at com.aivean.royalroad.Main$$anonfun$4$$anonfun$apply$1.apply(Main.scala:71)
        at scala.util.Try$.apply(Try.scala:192)
        at com.aivean.royalroad.Main$.retry(Main.scala:60)
        at com.aivean.royalroad.Main$$anonfun$4.apply(Main.scala:71)
        at com.aivean.royalroad.Main$$anonfun$4.apply(Main.scala:66)
        at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
        at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
        at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
        at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
        at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
        at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:169)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
        at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
        at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
        at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The correct URL

https://www.royalroadl.com/fiction/629/until-death/chapter/11413/99-this-is-my-fianc%c3%a9e

is displayed in the "downloading" line, but the one that is actually downloaded is

https://royalroadl.co m/fiction/629/until-death/chapter/11413/99-this-is-my-fianc%25c3%25a9e

I tried commenting out the URLDecoder stuff, but it remained the same.

Since the actual chapter title doesn't matter (there's a chapter ID in the URL), maybe the title could be removed to avoid problems like this?

The URLs

https://www.royalroadl.com/fiction/629/until-death/chapter/11413/99-this-is-my-fianc%c3%a9e

and

https://www.royalroadl.com/fiction/629/until-death/chapter/11413/null

return the same page. Funnily,

https://www.royalroadl.com/fiction/629/until-death/chapter/11413/

does not work, so you'd have to add an arbitrary string after the chapter ID.

PS: This is a great tool!

Aivean commented 6 years ago

@saviola777 , thanks for detailed report. I'll take a look.

Aivean commented 6 years ago

@saviola777 , there was some weirdness going on. Url like https://royalroadl.com/fiction/629/until-death/chapter/11589/100-this-is-my-fiancé was returning 404, while https://www.royalroadl.com/fiction/629/until-death/chapter/11589/100-this-is-my-fiancé was working fine.

I changed the url prefix to https://www.royalroadl.com/ in the latest release. Seems to work.