dgPadBootcamps / Java-Bootcamp-2024

1 stars 0 forks source link

Task 2 : Adding Dependencies and Web Scraping #100

Closed mohammad-fahs closed 2 months ago

mohammad-fahs commented 2 months ago

Task 2 :

Objective:

In this task, you will initialize a new Spring Boot project, add the JSoup dependency, and write a simple Java program to scrape data from a website of your choice. The goal is to get hands-on experience with setting up a Spring Boot project, using an external library (JSoup), and applying web scraping techniques.

Instructions:

  1. Initialize a New Spring Boot Project:
    • Visit [Spring Initializr](https://start.spring.io/) and generate a new Spring Boot project with the following settings:
      • Project: Maven
      • Language: Java
      • Spring Boot Version: 3.x (the latest stable version)
      • Project Metadata:
        • Group: com.yourname
        • Artifact: web-scraper
        • Name: Web Scraper
        • Package Name: com.yourname.webscraper
      • Dependencies: Add the Spring Web dependency (to allow adding more features later).
    • Click on "Generate" to download the project as a ZIP file.
    • Unzip the downloaded file and open the project in IntelliJ IDEA.
  2. Add the JSoup Dependency:
    • Open the pom.xml file in the root directory of your project.
    • Add the following JSoup dependency within the <dependencies> tag:
    • Save the pom.xml file and allow IntelliJ to update the Maven project to download the JSoup library.
  3. Choose a Website for Scraping:
    • Select a website that you find interesting or relevant. It could be an e-commerce site, a news website, a blog, or any other public web page with data you'd like to extract.
    • Identify the specific data you want to scrape from the website (e.g., product names, prices, article titles, etc.).
  4. Implement a CommandLineRunner Class:
    • Create a new Java class in the com.yourname.webscraper package that implements the CommandLineRunner interface.
    • In the run method, use JSoup to connect to the website you chose and scrape the data.
    • Print the scraped data to the console.
  5. Run Your Application:
    • Run the Spring Boot application and observe the output in the console.
    • Ensure that the scraped data is displayed correctly.

Submit Your Work:

Resources that can help:

ZahraaSaleh13 commented 2 months ago

CosmalineSoftWave class:

package com.ZahraaSaleh.web_scraper;

public class CosmalineSoftWave {
    String title;
    String link;
    String imageUrl;
    String summaryDetails;
    String price;

    public String getPrice() {
        return price;
    }

    public void setPrice(String price) {
        this.price = price;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getLink() {
        return link;
    }

    public void setLink(String link) {
        this.link = link;
    }

    public String getImageUrl() {
        return imageUrl;
    }

    public void setImageUrl(String imageUrl) {
        this.imageUrl = imageUrl;
    }

    public String getSummaryDetails() {
        return summaryDetails;
    }

    public void setSummaryDetails(String summaryDetails) {
        this.summaryDetails = summaryDetails;
    }

    public CosmalineSoftWave(String title, String link, String imageUrl, String summaryDetails, String price) {
        this.title = title;
        this.link = link;
        this.imageUrl = imageUrl;
        this.summaryDetails = summaryDetails;
        this.price = price;
    }

    @Override
    public String toString() {
        return "CosmalineSoftWave{" +
                "title='" + title + '\'' +
                ", link='" + link + '\'' +
                ", imageUrl='" + imageUrl + '\'' +
                ", summaryDetails='" + summaryDetails + '\'' +
                ", price='" + price + '\'' +
                '}';
    }
}

Scraping Appliction

package com.ZahraaSaleh.web_scraper;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;

@Component
public class ScrapingApplication implements CommandLineRunner {

    @Override
    public void run(String... args) throws Exception{

            String url;
            Scanner sc= new Scanner(System.in);
            System.out.println("please enter the url to scrape ");
            url= sc.next();
            //Scraping
                List<CosmalineSoftWave> cosmalineSoftWaves = new ArrayList<>();
                Document document = Jsoup.connect(url).get();
                Elements products = document.select("td.oe_product");

                for (Element productElement : products) {
                    // Extract product name
                    String productName = productElement.select("h6.o_wsale_products_item_title a").text();

                    // Extract product URL
                    String productUrl = productElement.select("h6.o_wsale_products_item_title a").attr("href");

                    // Extract image URL
                    String imageUrl = productElement.select("div.oe_product_image img").attr("src");

                    // Extract price
                    String price = productElement.select("div.product_price span.oe_currency_value").text();

                    // Print the extracted data
                    System.out.println("Product Name: " + productName);
                    System.out.println("Product URL: " + productUrl);
                    System.out.println("Image URL: " + imageUrl);
                    System.out.println("Price: $" + price);
                    System.out.println("---------------------------");
                }
            }

}

Link:

https://shop.cosmaline.com/shop?search=&order=&tags=304

Screenshot (22)