KadekM / scrawler

Scala web crawling and scraping using fs2 streams
MIT License
15 stars 3 forks source link

Readme basic usage #16

Open KadekM opened 8 years ago

KadekM commented 8 years ago
yelled1 commented 7 years ago

git pull the source git clone git@github.com:KadekM/scrawler.git I used the sbt package to create jar files # this was the wrong move created a project . ├── build.sbt (see below) ├── lib (the compiled jar files go here not necessary & cause of the error ) ├── project │   └── build.properties (specify the sbt version 0.13.15 in my case) └── src └── main └── scala (myCrawler.scala goes here)

ran intellij import by sbt created a myCrawler.scala copied class and added import com.marekkadek.scraper.Document import com.marekkadek.scraper.jsoup.JsoupBrowser import com.marekkadek.scrawler.crawlers.{Crawler, Visit, Yield, YieldData} import fs2.{Strategy, Stream, Task} this solves two errors below

However, I am stuck on 2 errors

  1. override protected def onDocument(document: Document): Stream[Task, Yield[String]] = { Task Takes Type Parameters
  2. Stream.emit(title) ++ Stream.emits(followableLinks) Cannot resolve ++, emit, emits

Complete newbie myself. So, I am stuck here.

yelled1 commented 7 years ago

I was able to compile it after changing import fs2._ from import fs2.Task But run fails!

object WikiGo {
  def main(args: Array[String]) {
    val crawler = new myCrawler
    // crawl wikipedia sequentially and take 10 elements (titles of visited websites)
    val titles: Vector[String] = crawler.sequentialCrawl("https://wikipedia.org").take(10).runLog.unsafeRun
    println(titles)
}

[IJ]> compile [success] Total time: 0 s, completed Sep 8, 2017 11:12:51 PM [IJ]> run [info] Running WikiGo [error] (run-main-5) java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V at com.marekkadek.scrawler.crawlers.Visit.(crawlers.scala:12) at com.marekkadek.scrawler.crawlers.Crawler.sequentialCrawl(crawlers.scala:35) at WikiGo$.main(WikiGo.scala:5) at WikiGo.main(WikiGo.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) [trace] Stack trace suppressed: run last compile:run for the full output. java.lang.RuntimeException: Nonzero exit code: 1 at scala.sys.package$.error(package.scala:27) [trace] Stack trace suppressed: run last compile:run for the full output. [error] (compile:run) Nonzero exit code: 1 [error] Total time: 0 s, completed Sep 8, 2017 11:12:57 PM [IJ]>

visox commented 7 years ago

Hi,

you probably have a dependancy problem, (the code is fine)

can you share with us your build.sbt ?

also you created jar file for this project ? why not just add it as a dependancy in build.sbt?

An to your first comment, this project/crawler basically emits an fs2.stream and for that you typically need to import fs2.{Strategy, Stream, Task}

yelled1 commented 7 years ago

Hi:

Here's my build.sbt

name := "ScraperProject"

version := "1.1"

scalaVersion := "2.11.8"

libraryDependencies += "com.marekkadek" %% "scrawler" % "0.0.3"

Ah..., the jar files were somehow was causing problems! I removed the lib directory (with jars) from the root dir & it ran fine. I guess one cannot use libraryDependencies & jar files at the same time. My 1st Scala external lib dependent compiled & ran! Thanks u much,

visox commented 7 years ago

Hi, no problem, happy crawling

KadekM commented 7 years ago

@yelled1 yes, just use sbt for dependency managment :) feel free to open issue if you encounter any.