BayanGroup / nutch-custom-search

65 stars 34 forks source link

Nutch 2.x support #41

Open tomchiverton opened 7 years ago

kaidul commented 7 years ago

hi, I've implemented Nutch 2.x version of this plugin and tested on my project. Can I send a pull request?

haidawyl commented 7 years ago

@kaidul Could you show me your code? I want Nutch 2.x version.

kaidul commented 7 years ago

@haidawyl Which functionality do you need? I wrote some plugins recently but didn't furnish enough to send the PR here. Let me knowwhich functionality do you need and I will send you the code personally.

haidawyl commented 7 years ago

@kaidul I can't find org.apache.nutch.parse.HtmlParseFilter in Nutch 2.x version.

haidawyl commented 7 years ago

@kaidul I hope all of them. When I change Nutch to 2.3.1, there are errors in ExtractorFetchSchedule.java, ExtractorIndexingFilter.java, ExtractorParseFilter.java, ExtractorParser.java, ExtractorScoringFilter.java, and OPICScoringFilter.java, I could not compile them.

kaidul commented 7 years ago

Yes, it isn't supposed to work that way for Nutch 2.x I meant what functionality do you require for your project as you're planning to use this plugin?

bl4ck1c3 commented 7 years ago

Is this plug-in still available for 1.x?

Thank you

On Wed, May 31, 2017 at 10:32 AM, Kaidul Islam notifications@github.com wrote:

Yes, it isn't supposed to work that way for Nutch 2.x I meant what functionality do you require for your project so that you're planning to use this plugin?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BayanGroup/nutch-custom-search/issues/41#issuecomment-305122921, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOCVLvTvvUYQUFPVY04ItYASBBXed5uks5r_SWqgaJpZM4L2FE_ .

haidawyl commented 7 years ago

@kaidul Could you send the project 'zal.extractor.nutch' to me(haidawyl@163.com).

Thank you.

haidawyl commented 7 years ago

@kaidul I am trying to implement Nutch 2.x version, but it is hard to me.

haidawyl commented 7 years ago

@kaidul I could extract fields from the pages. But how to store the values to hbase. It does not append fields to org.apache.nutch.storage.WebPage. Could you help me? Thanks.

hainguyenvan commented 7 years ago

Hi haidawyl You read code plugin "parse-selector" link git hub : https://github.com/hainguyenvan/apache-nutch-2.3/tree/master/src/plugin/parse-selector https://github.com/hainguyenvan/apache-nutch-2.3/tree/master/src/plugin/parse-selector in me implement parse data by select xpath and save data to hase. Index data to elasticsearch you config you config index.metadata in nutch-default.xml Thank !

On Thu, Jun 8, 2017 at 2:35 PM, haidawyl notifications@github.com wrote:

@kaidul https://github.com/kaidul I could extract fields from the pages. But how to store the values to hbase. It does not append fields to org.apache.nutch.storage.WebPage. Could you help me? Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BayanGroup/nutch-custom-search/issues/41#issuecomment-307024123, or mute the thread https://github.com/notifications/unsubscribe-auth/AJm72OPPqwu0ZBEB0MWAei3AqyFItoIoks5sB6RSgaJpZM4L2FE_ .

haidawyl commented 7 years ago

@hainguyenvan Thanks! I will read the source.