The-CodingSloth / sloth-search

MIT License
5 stars 1 forks source link

Sloth Search - A Google-like Search Engine Clone

Sloth Search is a project that aims to recreate Google, including crawling, indexing, and serving results through a user-friendly front-end interface. The project consists of three main components: the Client, Search, and Server. Check out the video for a full explanation here

Project Structure

The project is divided into the following folders:

Installation and Setup

  1. Clone the Repository

    git clone <repository-url>
    cd sloth-search
  2. Install the necessary Python dependencies, run:

pip install -r requirements.txt
  1. Client Setup

    • The client contains the HTML, CSS, and JavaScript code to run the front-end.
    • Open the index.html file in your browser, or use a static file server to serve the client code locally.
    • You can also use the live server extension.
  2. Search Setup

python search/<path to file you want to run>
  1. Search Setup
    • The server uses Flask to provide an API for search queries.
    • Start the Flask server by navigating to the Server directory and running:
      python google_search_api.py

How It Works

  1. Crawling

    • The crawler starts with a set of seed URLs and collects links and content from the web.
    • It respects robots.txt to avoid being blocked and to ensure ethical crawling.
    • Parsed data is stored in a format ready for indexing.
  2. Indexing

    • The indexing module processes the crawled pages.
    • The content is tokenized, cleaned, stemmed, and stop words are removed using the NLTK library.
    • The resulting indexed data is saved to be used by the search API.
  3. Serving and PageRank

    • The PageRank algorithm is used to rank pages based on their importance.
    • When a user searches for a query through the client, the server uses the indexed data and PageRank scores to return the most relevant pages.

Important Notes

Contributing

Contributions are welcome! If you'd like to contribute to the development of Sloth Search, feel free to fork the repository, make changes, and submit a pull request.

License

This project is open-source and available under the MIT License.

If you have any questions or suggestions, feel free to contact me.

Happy Searching with Sloth Search! 🦥🔍