feat: Scrapping Documentations from website for building Knowledge Graphs

This issue invites contributors to develop a Python script that scrapes documentation from websites

Create a Python script to scrape documentation from a given website, including all sublinks.
Store the content from each link in a separate Llama-Index document object.
Each document object should have metadata that includes the source URL of the scraped data.
Compile all document objects into a list to form the complete Llama-Index.

Scraping the Documentation:
- Utilize Python libraries like Beautiful Soup, Scrapy, or Requests-HTML to scrape content from the main documentation page and all associated sublinks.
- Ensure accurate extraction of relevant content, including text, code snippets, and descriptions.
Llama-Index Document Object Creation:
- Store the data scraped from each individual link in a separate Llama-Index document object.
- Attach metadata to each document object that records the URL of the link from which the content was scraped.
- Compile all individual document objects into a list, representing the complete Llama-Index.
Documentation:
- Document the script clearly, providing instructions on how to use it.
- note-book implementation for various strategies and responses mostly research
- Later we can build a module out of it.
Error Handling:
- Implement robust error handling to manage issues such as broken links, failed requests, or unexpected data formats.

Implementation: Develop the Python script ensuring it meets the outlined requirements.
Documentation: Include comprehensive comments and docstrings that explain the functionality and usage of the script.
Submit a Pull Request (PR):
- Reference this issue in your PR.
- Provide a description of your implementation, any challenges faced, and considerations made during development.

Explore Python libraries such as Beautiful Soup, Scrapy, and Requests-HTML for web scraping.
Refer to Llama-Index documentation for guidance on creating document objects and managing metadata.

We look forward to your valuable contributions that will enhance our capability to integrate website documentation into our knowledge systems!

c2siorg / Project-Explainer