dgPadBootcamps / Java-Bootcamp-2024

1 stars 0 forks source link

Task 2 : Adding Dependencies and Web Scraping #74

Closed mohammad-fahs closed 2 months ago

mohammad-fahs commented 2 months ago

Task 2 :

Objective:

In this task, you will initialize a new Spring Boot project, add the JSoup dependency, and write a simple Java program to scrape data from a website of your choice. The goal is to get hands-on experience with setting up a Spring Boot project, using an external library (JSoup), and applying web scraping techniques.

Instructions:

  1. Initialize a New Spring Boot Project:
    • Visit [Spring Initializr](https://start.spring.io/) and generate a new Spring Boot project with the following settings:
      • Project: Maven
      • Language: Java
      • Spring Boot Version: 3.x (the latest stable version)
      • Project Metadata:
        • Group: com.yourname
        • Artifact: web-scraper
        • Name: Web Scraper
        • Package Name: com.yourname.webscraper
      • Dependencies: Add the Spring Web dependency (to allow adding more features later).
    • Click on "Generate" to download the project as a ZIP file.
    • Unzip the downloaded file and open the project in IntelliJ IDEA.
  2. Add the JSoup Dependency:
    • Open the pom.xml file in the root directory of your project.
    • Add the following JSoup dependency within the <dependencies> tag:
    • Save the pom.xml file and allow IntelliJ to update the Maven project to download the JSoup library.
  3. Choose a Website for Scraping:
    • Select a website that you find interesting or relevant. It could be an e-commerce site, a news website, a blog, or any other public web page with data you'd like to extract.
    • Identify the specific data you want to scrape from the website (e.g., product names, prices, article titles, etc.).
  4. Implement a CommandLineRunner Class:
    • Create a new Java class in the com.yourname.webscraper package that implements the CommandLineRunner interface.
    • In the run method, use JSoup to connect to the website you chose and scrape the data.
    • Print the scraped data to the console.
  5. Run Your Application:
    • Run the Spring Boot application and observe the output in the console.
    • Ensure that the scraped data is displayed correctly.

Submit Your Work:

Resources that can help:

douaaobeid commented 2 months ago

@mohammad-fahs Website: OLX https://www.dubizzle.com.lb/

OLX is a popular online marketplace where users can buy and sell a wide range of products and services. It operates globally and allows individuals to list items such as electronics, vehicles, real estate, and more. Users can search for local listings, negotiate prices, and connect directly with sellers or buyers. OLX aims to simplify transactions by providing a platform for classified ads that is easy to use and accessible via both web and mobile applications.

Data Scraped: Mobile phones products

Console Screenshot for Scraped Data image image

Java Code

package com.douaaObeid.web_Scraper2;

import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.stereotype.Component;

import java.net.URL;
import java.util.List;
import java.util.Scanner;

@Component
public class OLXScrapperRunner implements CommandLineRunner {

    OLXScraper scrapingService = new OLXScraper();

    public static void main(String[] args) {
        SpringApplication.run(WebScraper2Application.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        Scanner scanner = new Scanner(System.in);
        while (true) {
            System.out.println("Enter link to scrape or 'exist' to stop:");
            String link = scanner.nextLine();

            if (link.equalsIgnoreCase("exist")) {
                break;
            }

            try {
                System.out.println("Scraping link: " + link);

                List<OLX> OLXList = scrapingService.scrapeOLX(link);

                System.out.println("Scraping complete. Found " + OLXList.size() + " items.");

                for (OLX olx : OLXList) {
                    System.out.println(olx.toString());
                }

            } catch (Exception e) {
                System.err.println("An error occurred during scraping: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }

}