dgPadBootcamps / Java-Bootcamp-2024

1 stars 0 forks source link

Task 2 : Adding Dependencies and Web Scraping #66

Closed mohammad-fahs closed 2 months ago

mohammad-fahs commented 2 months ago

Task 2 :

Objective:

In this task, you will initialize a new Spring Boot project, add the JSoup dependency, and write a simple Java program to scrape data from a website of your choice. The goal is to get hands-on experience with setting up a Spring Boot project, using an external library (JSoup), and applying web scraping techniques.

Instructions:

  1. Initialize a New Spring Boot Project:
    • Visit [Spring Initializr](https://start.spring.io/) and generate a new Spring Boot project with the following settings:
      • Project: Maven
      • Language: Java
      • Spring Boot Version: 3.x (the latest stable version)
      • Project Metadata:
        • Group: com.yourname
        • Artifact: web-scraper
        • Name: Web Scraper
        • Package Name: com.yourname.webscraper
      • Dependencies: Add the Spring Web dependency (to allow adding more features later).
    • Click on "Generate" to download the project as a ZIP file.
    • Unzip the downloaded file and open the project in IntelliJ IDEA.
  2. Add the JSoup Dependency:
    • Open the pom.xml file in the root directory of your project.
    • Add the following JSoup dependency within the <dependencies> tag:
    • Save the pom.xml file and allow IntelliJ to update the Maven project to download the JSoup library.
  3. Choose a Website for Scraping:
    • Select a website that you find interesting or relevant. It could be an e-commerce site, a news website, a blog, or any other public web page with data you'd like to extract.
    • Identify the specific data you want to scrape from the website (e.g., product names, prices, article titles, etc.).
  4. Implement a CommandLineRunner Class:
    • Create a new Java class in the com.yourname.webscraper package that implements the CommandLineRunner interface.
    • In the run method, use JSoup to connect to the website you chose and scrape the data.
    • Print the scraped data to the console.
  5. Run Your Application:
    • Run the Spring Boot application and observe the output in the console.
    • Ensure that the scraped data is displayed correctly.

Submit Your Work:

Resources that can help:

Ranaleo commented 2 months ago

The website chosen for this example is "Books to Scrape" (http://books.toscrape.com). It’s a demo site designed specifically for practicing web scraping techniques. The site features a collection of books organized into categories, with each book listing its title, price, rating, and other details. The layout is simple and static, making it ideal for learning and practicing scraping.

The data scraped includes:

The scraped data was extracted from the main page of the site, where each book is displayed in a grid format. The book titles and prices were retrieved using CSS selectors that target the specific HTML elements containing this information. Task2.zip @mohammad-fahs