data from JS pages is not returned

USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Apache License 2.0

410 stars 143 forks source link

@chaitra-rs thanks for reporting this. By default, the javascript execution is not enabled (why? JS execution is very very slow)

However, if you do need it, you need to enable a plugin.

https://github.com/USCDataScience/sparkler/blob/5c2201310623b70e6bf024e51e521eb4bffc4723/conf/sparkler-default.yaml#L102-L104

how? locate sparkler-default.yamlinside the docker that is being used by sparkler, and uncomment one of those fetcher plugins capable of executing javascript.

I dont know which one is best since each has their +s and -s (suggest trial and error for your usecase.).

USCDataScience / sparkler

data from JS pages is not returned #174

Trying to get data from a page which has JS scripts. Page shows data but output file doesn't.

How to reproduce it

Environment and Version Information