dalab / web2text

Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18
MIT License
168 stars 31 forks source link

Cannot run Recipe #6

Closed jrojas28 closed 5 years ago

jrojas28 commented 5 years ago

Hello, I've been trying to get the base recipe running for a while and have found no success. The following is happening upon inputting the command run :

[info] Compiling 37 Scala sources to /home/jrojas/Projects/Extractors/web2text/target/scala-2.10/classes ... [error] /home/jrojas/Projects/Extractors/web2text/src/main/scala/ch/ethz/dalab/web2text/cdom/Node.scala:39:33: missing parameter type [error] val c = for (c <- children; l <- c.toString.lines) yield {" " + l} [error] ^ [error] /home/jrojas/Projects/Extractors/web2text/src/main/scala/ch/ethz/dalab/web2text/cleaneval/CleanEval.scala:140:54: value drop is not a member of java.util.stream.Stream[String] [error] val contents = if (f.startsWith("URL:")) f.lines.drop(1).mkString("\n") [error] ^ [error] /home/jrojas/Projects/Extractors/web2text/src/main/scala/ch/ethz/dalab/web2text/features/PageFeatures.scala:29:63: type mismatch; [error] found : java.util.stream.Stream[String] [error] required: Iterator[?] [error] (blockFeatureLabels.toIterator zip blockFeatures.toString.lines) [error] ^ [error] three errors found [error] (Compile / compileIncremental) Compilation failed [error] Total time: 5 s, completed Nov 12, 2019, 12:42:05 AM

System information: OS: Ubuntu 18.04 SBT Version: 1.3.3 Scala Version: 2.10.4

Also tested with: SBT Version: 0.13.7

jrojas28 commented 5 years ago

After some more research I managed to run it. The issue seems to have been related to the Java SDK version. I've used the following setup to run it succesfully:

Java SDK: 8 SBT Version: 0.13.7 Scala Version: 2.10.4

tvogels commented 5 years ago

Thanks a lot for reporting this. If you know what changed in the other version, feel free to submit a PR.

whatsdis commented 4 years ago

I am running into this same issue. with sbt 1.3.7

tvogels commented 4 years ago

Hi, thanks for reporting this. The code was tested on Docker image hseeberger/scala-sbt:8u222_1.3.3_2.13.1 so if you can replicate that environment, things should work. If you figure out what changed in the different versions of scala/sbt, then we can update the code.

whatsdis commented 4 years ago

@tvogels following the answer above it is compiling with jdk 8!

However this is what I see now:

sbt:Boilerplate> run; [info] Compiling 37 Scala sources to /home/asdf/web2text/target/scala-2.10/classes ... [info] Non-compiled module 'compiler-bridge_2.10' for Scala 2.10.4. Compiling... [info] Compilation completed in 26.674s. [warn] there were 2 deprecation warning(s); re-run with -deprecation for details [warn] there were 7 feature warning(s); re-run with -feature for details [warn] two warnings found [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list

Multiple main classes detected, select one to run:

[1] ch.ethz.dalab.web2text.ApplyLabelsToPage [2] ch.ethz.dalab.web2text.ExtractPageFeatures [3] ch.ethz.dalab.web2text.MainageBin / mappings 0s

How do I actually run ch.ethz.dalab.web2text.ExtractPageFeatures index.html ? Sorry I am new to sbt and scala.

thank you.

edit: I am unable to cancel....

[warn] Canceling execution...

[warn] Canceling execution...

[warn] Canceling execution...

[warn] Canceling execution...

[warn] Canceling execution...

[warn] Canceling execution...

[warn] Canceling execution...

whatsdis commented 4 years ago

now when I select one of the class sbt:Boilerplate> run ch.ethz.dalab.web2text.ExtractPageFeatures; [info] Compiling 37 Scala sources to /home/asdf/web2text/target/scala-2.10/classes ... [warn] there were 2 deprecation warning(s); re-run with -deprecation for details [warn] there were 7 feature warning(s); re-run with -feature for details [warn] two warnings found [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list

Multiple main classes detected, select one to run:

[1] ch.ethz.dalab.web2text.ApplyLabelsToPage [2] ch.ethz.dalab.web2text.ExtractPageFeatures [3] ch.ethz.dalab.web2text.MainrnalHooks 0s

Enter number: 2

[info] running (fork) ch.ethz.dalab.web2text.ExtractPageFeatures ch.ethz.dalab.web2text.ExtractPageFeatures [error] Exception in thread "main" java.lang.IllegalArgumentException: Expecting arguments: (1) input html file, (2) output file base name [error] at ch.ethz.dalab.web2text.ExtractPageFeatures$.main(ExtractPageFeatures.scala:23) [error] at ch.ethz.dalab.web2text.ExtractPageFeatures.main(ExtractPageFeatures.scala) [error] Nonzero exit code returned from runner: 1 [error] (Compile / run) Nonzero exit code returned from runner: 1 [error] Total time: 53 s, completed Jan 30, 2020 9:43:39 PM sbt:Boilerplate> run ch.ethz.dalab.web2text.ExtractPageFeatures index.html output.csv; [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list

Multiple main classes detected, select one to run:

[1] ch.ethz.dalab.web2text.ApplyLabelsToPage [2] ch.ethz.dalab.web2text.ExtractPageFeatures [3] ch.ethz.dalab.web2text.MainileIncremental 0s

Enter number: 2

[info] running (fork) ch.ethz.dalab.web2text.ExtractPageFeatures ch.ethz.dalab.web2text.ExtractPageFeatures index.html output.csv [error] Exception in thread "main" java.io.FileNotFoundException: ch.ethz.dalab.web2text.ExtractPageFeatures (No such file or directory) [error] at java.io.FileInputStream.open0(Native Method) [error] at java.io.FileInputStream.open(FileInputStream.java:195) [error] at java.io.FileInputStream.(FileInputStream.java:138) [error] at scala.io.Source$.fromFile(Source.scala:90) [error] at scala.io.Source$.fromFile(Source.scala:75) [error] at scala.io.Source$.fromFile(Source.scala:53) [error] at ch.ethz.dalab.web2text.utilities.Util$.loadFile(Util.scala:103) [error] at ch.ethz.dalab.web2text.ExtractPageFeatures$.extractPageFeatures(ExtractPageFeatures.scala:39) [error] at ch.ethz.dalab.web2text.ExtractPageFeatures$.main(ExtractPageFeatures.scala:25) [error] at ch.ethz.dalab.web2text.ExtractPageFeatures.main(ExtractPageFeatures.scala) [error] Nonzero exit code returned from runner: 1 [error] (Compile / run) Nonzero exit code returned from runner: 1 [error] Total time: 10 s, completed Jan 30, 2020 9:44:06 PM sbt:Boilerplate>

whatsdis commented 4 years ago

oh nvm I got it running with

sbt:Boilerplate> runMain ch.ethz.dalab.web2text.ExtractPageFeatures index.html output [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] running (fork) ch.ethz.dalab.web2text.ExtractPageFeatures index.html output [success] Total time: 7 s, completed Jan 30, 2020 9:47:35 PM sbt:Boilerplate>