A distributed search engine
git clone https://github.com/forumulator/hooli
$ cd hooli/Crawler
$ python4 setup_tables.py
$ python3 indexmgr.py
Now copy the URI that is echoed onto the screen
$ python3 testCrawler.py
Enter the number of Pages and the URI above
$ cd hooli/Crawler
$ python3 indexer.py
$ cd hooli
$ python3 hello.py
That's it, the server is running. Now you can go to the browser and search on localhost:5000.
src/spider_controller
: The controller for the spiders
src/crawler.py
: The actual crawler
src/indexer.py
: The indexer class
src/hbase_util.py
: Functions to implement the HBase Schema
src/indexmgr.py
: The index Manager, handles the distributed indexing
src/querying.py
: Handler the srach and ranking
hello.py
: Flask code for the server