Closed smadha closed 8 years ago
@smadha Thanks a lot!! Appreciate your hard work :) There was a dependency conflict when running fetcher on top of Apache Spark cluster. It is now fixed.
@thammegowda Everything looks good from my end. Tested it locally & on Spark cluster. Do you want to give it a roll before it's merged?
Build failed
[INFO] ------------------------------------------------------------------------
[INFO] Building fetcher-jbrowser 0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
...... ((Message trimmed))...
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ fetcher-jbrowser ---
[INFO]
[INFO] --- maven-bundle-plugin:2.5.0:manifest (default-cli) @ fetcher-jbrowser ---
[WARNING] Manifest edu.usc.irds.sparkler.plugin:fetcher-jbrowser:bundle:0.1-SNAPSHOT : Unused Private-Package instructions, no such package(s) on the class path: [!*]
[ERROR] Manifest edu.usc.irds.sparkler.plugin:fetcher-jbrowser:bundle:0.1-SNAPSHOT : Bundle-Activator not found on the bundle class path nor in imports: edu.usc.irds.sparkler.plugin.FetcherJBrowserActivator
[ERROR] Error(s) found in manifest configuration
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] sparkler-parent .................................... SUCCESS [ 0.377 s]
[INFO] sparkler-api ....................................... SUCCESS [ 4.851 s]
[INFO] sparkler ........................................... SUCCESS [ 45.358 s]
[INFO] sparkler-plugins ................................... SUCCESS [ 0.069 s]
[INFO] urlfilter-regex .................................... SUCCESS [ 2.329 s]
[INFO] fetcher-jbrowser ................................... FAILURE [ 3.998 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:00 min
[INFO] Finished at: 2016-10-19T14:46:13-07:00
[INFO] Final Memory: 64M/1499M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.felix:maven-bundle-plugin:2.5.0:manifest (default-cli) on project fetcher-jbrowser: Error(s) found in manifest configuration -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :fetcher-jbrowser
@thammegowda Can you please try building Sparkler with the following:
mvn clean install
@karanjeets If i do mvn clean install
and then try to run
14:59 $ bin/sparkler.sh inject -su http://www.isjavascriptenabled.com/
>>jobId = sjob-1476914418319
15:00 $ bin/sparkler.sh crawl -id sjob-1476914418319
I get
16/10/19 15:01:37 INFO FetchFunction$: FETCHING http://www.isjavascriptenabled.com/
16/10/19 15:01:37 INFO PluginService$: Felix Configuration loaded successfully
Bundle Found: org.apache.felix.framework
16/10/19 15:01:37 WARN FetchFunction$: FETCH-ERROR http://www.isjavascriptenabled.com/
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at edu.usc.irds.sparkler.pipeline.FetchFunction$.apply(FetchFunction.scala:46)
at edu.usc.irds.sparkler.pipeline.FetchFunction$.apply(FetchFunction.scala:34)
at edu.usc.irds.sparkler.pipeline.FairFetcher.next(FairFetcher.scala:52)
at edu.usc.irds.sparkler.pipeline.FairFetcher.next(FairFetcher.scala:29)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:285)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/10/19 15:01:37 INFO ParseFunction$: PARSING http://www.isjavascriptenabled.com/
Plugin not available/enabled?
I see they are here
15:05 $ ls -al sparkler-app/bundles/
total 66736
drwxr-xr-x 5 thammegr 703763885 170 Oct 19 14:54 .
drwxr-xr-x 6 thammegr 703763885 204 Oct 19 14:58 ..
-rw-r--r-- 1 thammegr 703763885 0 Oct 19 13:40 .donotdelete
-rw-r--r-- 1 thammegr 703763885 34153743 Oct 19 14:59 fetcher-jbrowser-0.1-SNAPSHOT.jar
-rw-r--r-- 1 thammegr 703763885 10543 Oct 19 14:59 urlfilter-regex-0.1-SNAPSHOT.jar
My environment:
15:06 $ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
✔ ~/work/irds/sparkler [js-plugin L|✔]
15:06 $ mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T08:41:47-08:00)
Maven home: /usr/local/Cellar/maven/3.3.9/libexec
Java version: 1.8.0_101, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.11.6", arch: "x86_64", family: "mac"
@thammegowda This is strange. It worked for me just fine. Let me try with bin/sparkler.sh
and see where things are wrong.
@thammegowda The issue is with the bin/sparkler.sh
It is using the conf
directory in the build path. As per the new design in Plugin system, the path to bundles directory is generated at compile time using maven resources plugin. See here
Now, as I think, did you add conf
directory in build path to pick post build changes? If yes, I can do something similar with the compiled conf directory.
@karanjeets I am prepending 'conf' dir to classpath to give higher priority. Is there a way to restore that functionality?
@thammegowda Changes have been made. Please review and merge.
@karanjeets @smadha merged 💯 . This is a fantastic PR 👍