Building a search engine from scratch. We plan on implementing the 3 major components in a search engine - Crawler
, Parser
and Indexing
. We will begin by developing command line tools for these components and then wrapping these with an API service to be used by a frontend. This project is being done under IEEE-NITK.
To establish a VPN connection to NITK-NET:
sudo openvpn <path-to-config-file>
to initiate the connection sequence. Keep this terminal open.ssh <user>@<container-ip>
and then enter necessary details on being prompted.Docker Engine
by following this link.# Install Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get -y update \
&& apt-get -y install google-chrome-stable
# Install chromedriver
RUN wget -N https://chromedriver.storage.googleapis.com/111.0.5563.64/chromedriver_linux64.zip -P ~/ \
&& unzip ~/chromedriver_linux64.zip -d ~/ \
&& rm ~/chromedriver_linux64.zip \
&& mv -f ~/chromedriver /usr/local/bin/chromedriver
Warning
Take care to usecompatible
versions forgoogle-chrome
andchromedriver
. Refer this answer on StackOverflow.
.env
.
MONGO_USER=admin
MONGO_PASSWORD=adminpw
MONGO_DATABASE=test
andromeda/requirements.txt
after activating the environment.docker-compose up -d
to bring up the MongoDB
server.python3 andromeda/crawler.py start
to start the process of crawling.Note
In the Docker network, the MongoDB server will be running at port -27017
and a service known as Mongo-Express will be running at port -8081
which provides a GUI to access the database.
pylint andromeda/
before making a PR and get rid of any lint errors.