VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
454 stars 135 forks source link

Need to create customized model #168

Closed sheeluee7 closed 2 years ago

sheeluee7 commented 6 years ago

Can I create a customized model using my own classifier & training data & plug it in ache crawler ? I want to have my own feature set & classifier algorithm to make the model which will give me more control in future. After that I need the ache crawlers infrastructure for crawling.

Also, I trained a model using 3000 positive & 3000 negative training data. I got decent results, in order to improve these results, I need to try some different approaches by changing the ML algorithm of the model.

aecio commented 6 years ago

If the algorithm you plan to implement is too specific for your application, you can fork the crawler and implement the interface TargetClassifier. Then you just need to instantiate it in TargetClassifierFactory class. It should be fairly simple to implement this interface, you can take a look at the classes in the directory focusedCrawler.target.classifier for implementation examples (see SmileTargetClassifier for a machine-learning-based classifier).

If you just need more flexible parameters than the ones available in the current implementation, you can take a look a the class where the classifier is trained: https://github.com/ViDA-NYU/ache/blob/master/src/main/java/focusedCrawler/target/classifier/SmileTargetClassifierBuilder.java

Finally, we accept pull requests for new features that may be useful to others.