Laughing-Man-Studios / Ski-Resort-Dashboard

A small dashboard to display some statistics about the local ski resorts I visit
MIT License
0 stars 0 forks source link

A-Basin Scraper is Broken #65

Closed Rogibb111 closed 1 year ago

Rogibb111 commented 2 years ago

When I hit the scraping endpoint to run the scraper in production (Fly.io), I am seeing an error in the logs when the A-Basin scraper is run. The error is as follows:

2022-11-08T05:18:41Z app[7bfeab2d] den [info]java.net.SocketTimeoutException: Read timeout
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.internal.ConstrainableInputStream.read(ConstrainableInputStream.java:58)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.io.InputStreamReader.read(InputStreamReader.java:184)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.io.BufferedReader.fill(BufferedReader.java:161)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.io.BufferedReader.read1(BufferedReader.java:212)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.io.BufferedReader.read(BufferedReader.java:286)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.CharacterReader.bufferUp(CharacterReader.java:81)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.CharacterReader.current(CharacterReader.java:129)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.TokeniserState$1.read(TokeniserState.java:12)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.Tokeniser.read(Tokeniser.java:57)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:65)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:47)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.parser.Parser.parseInput(Parser.java:35)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.helper.DataUtil.parseInputStream(DataUtil.java:175)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at org.jsoup.helper.HttpConnection$Response.parse(HttpConnection.java:835)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.doc$lzycompute$1(JsoupBrowser.scala:76)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.doc$1(JsoupBrowser.scala:76)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.processResponse(JsoupBrowser.scala:78)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.$anonfun$executePipeline$4(JsoupBrowser.scala:85)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scala.Function1.$anonfun$andThen$1(Function1.scala:85)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.get(JsoupBrowser.scala:35)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at net.ruippeixotog.scalascraper.browser.JsoupBrowser.get(JsoupBrowser.scala:29)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scrapers.BaseScraper.<init>(BaseScraper.scala:42)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scrapers.EpicScraper.<init>(EpicScraper.scala:14)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scrapers.ScraperFactory$.initializeScraper(BaseScraper.scala:24)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at controllers.HomeController.$anonfun$generateFuture$1(HomeController.scala:67)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:678)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
2022-11-08T05:18:41Z app[7bfeab2d] den [info]   at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Rogibb111 commented 1 year ago

Update on this, it seems that now whenever I try to scrape, I hit my memory limit and the machine kills the java process running the app. I'm guessing I will have to solve this issue first before addressing the issue above (if it is still occurring),

Rogibb111 commented 1 year ago

First steps to trying to debug this will probably be to see what the memory profile is on my local machine, and see if there is anything I can do in my app to clamp down on memory. Not sure if there is a way in Scala/Java to tune the JVM to use up less memory if possible. It looks like you can monitor the JVM instance using jvisualvm. Need to find out where that lives on my machine.

Rogibb111 commented 1 year ago

To debug the above issue, I used both VisualVM and EclipseMAT to do some analysis on the heap stack that my application is using. I couldn't find anything out of the ordinary accept that when running my app in production, it has a much smaller memory footprint than when it runs on fly.io. I am creating a new issue for this here.