-
Use an another crawler to search .onion pages from the public Internet. Search new .onion domains from different online sources. Ask help from organizations that are crawling. This is an excellent cas…
-
Given our discussion in #24, and previous discussions on Kafka spouts, HBase indexing, etc, we should think about reorganizing the project so that we have a core SDK and external SDK(s).
I was think…
-
I just tested your nutch-selenium and that works with Nutch 2.x perfect. I am trying to follow your instruction to use nutch-selenium-grid-plugin, however, I got stuck.
First, I failed to make your n…
-
We need a slick logo and banner for the website. Help wanted! :)
-
I download HiBench to my PC. I can't find any executable file to install HiBench. In addition, my Hadoop exists on AWS. Do I have to install HiBench in AWS? Where I can find the HiBench Installation …
-
@zerolocker @fengwuxing
我这几天做了这些事情:
## 1. 公文通爬虫
自己写了一个专门针对公文通的爬虫board_crawler.py(其实只有30-40行代码),代码放在了dedicated-crawlers文件夹下。爬出来的html文档在服务器dedicated-crawlers/raw_html/目录里面,1.1GB, 69575篇有有效内容的公文通(从200…
-
At add plugin in my project and run nuch, i have the next error:
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(Loc…
-
http://blog.cyeam.com http://blog.cyeam.com/
mailto:lichao@cyeam.com
-
**[Radim Kolar](https://jira.spring.io/secure/ViewProfile.jspa?name=hsn)** opened **[SPR-9671](https://jira.spring.io/browse/SPR-9671?redirect=false)** and commented
It seems like this message contai…
-
Testing with AVRO.nl. see http://www.linkedtv.eu/wiki/index.php/Enrichment_types_in_TKK#Broadcasters_archive
The useful media enrichments on this site need to be extracted from the Player pages (video…