DE-2410-A / web-scraping-sam

de-2410-a-challenges-web-scraping-web-scraping-activity created by GitHub Classroom
0 stars 0 forks source link

As a BOOK STORE OWNER, I want to understand how many books are available in each category from http://books.toscrape.com/, So that I can make decisions about which categories to focus on when launching a rival business #1

Open ashselva opened 6 days ago

serena351 commented 6 days ago

Tasks

serena351 commented 6 days ago

Function 1 - Making a Request

Functional Requirements

  1. [ ] The function should accept a URL as an input parameter.
  2. [ ] The function should send an HTTP GET request to the provided URL using the requests library.
  3. The function should handle the response:
    • [ ] If the response status code is 200, the function should return the HTML content of the page.
    • [ ] If the response status code is not 200, the function should return an error message indicating the status code and that the request was unsuccessful.
  4. [ ] The function should catch any exceptions that occur during the request and return an appropriate error message.

Testing Requirements

Function 1 - Making a Request

  1. Successful Request:

    • The function should receive a URL and make a GET request to the book.toscrape website.
    • The function should return the book data in JSON format if the response status code is 200.
  2. Unsuccessful Request:

    • The function should return an error message if the response status code is not 200.
    • The error message should include the status code and indicate that the request was unsuccessful.
  3. Exception Handling:

    • The function should catch any exceptions that occur during the request.
    • The function should return an error message indicating that an exception occurred.

Definition of Done

serena351 commented 6 days ago

Function 2 - Parsing the Response

Functional Requirements

  1. [ ] The function should query the supplied website using the result of the previously developed request_to_scrape function.
  2. [ ] The function should return a dictionary, where the keys are the category names, populated with dictionaries where the link is a key and the value is the category URL.
  3. [ ] The function should catch any exceptions that occur during the processing and return an appropriate error message, including not finding any relevant data in the HTML.

Testing Requirements

  1. Successful Parse:
    • [ ] The function should return a dictionary, populated with dictionaries where the category name is the key and the value is a dictionary with the link as a key and the value is the category URL.
  2. No Data Parse:
    • [ ] The function should return an error message if no categories are found in the HTML for the website.
  3. Exception Handling:
    • [ ] The function should return an error message if an exception occurs during the processing.

Definition of Done

serena351 commented 6 days ago

Function 3 - Extracting the Data

Functional Requirements

  1. [ ] The function should extract the data from the HTML.
  2. [ ] The function returns the correct categories and links.

Testing Requirements

  1. Successful Extraction:
    • [ ] The function should return a dictionary, populated with dictionaries where the category name is the key and the value is a dictionary with the link as a key and the value is the category URL.
  2. Exception Handling:
    • [ ] The function should return an error message if an exception occurs during the processing.

Definition of Done

serena351 commented 6 days ago

Function 4 - Saving the Data for Processing

Functional Requirements

  1. [ ] The function should save the extracted data to a JSON file.

Testing Requirements

Exception Handling:

Definition of Done