Closed mohammad-fahs closed 2 months ago
My application extracts movies data from the famous IMDB website and displays their: title, year, duration, age rating, stars rating, image url, and the navigation link of the movie. Below are the files behind this application.
package com.hodroj.webscraper;
public class Movie {
private String title;
private String year;
private String duration;
private String ageRating;
private String starRating;
private String imgUrl;
private String movieUrl;
public Movie() {
}
public Movie(String title, String year, String duration, String ageRating, String starRating, String imgUrl, String movieUrl) {
this.title = title;
this.year = year;
this.duration = duration;
this.ageRating = ageRating;
this.starRating = starRating;
this.imgUrl = imgUrl;
this.movieUrl = movieUrl;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getYear() {
return year;
}
public void setYear(String year) {
this.year = year;
}
public String getDuration() {
return duration;
}
public void setDuration(String duration) {
this.duration = duration;
}
public String getAgeRating() {
return ageRating;
}
public void setAgeRating(String ageRating) {
this.ageRating = ageRating;
}
public String getStarRating() {
return starRating;
}
public void setStarRating(String starRating) {
this.starRating = starRating;
}
public String getImgUrl() {
return imgUrl;
}
public void setImgUrl(String imgUrl) {
this.imgUrl = imgUrl;
}
public String getMovieUrl() {
return movieUrl;
}
public void setMovieUrl(String movieUrl) {
this.movieUrl = movieUrl;
}
@Override
public String toString() {
return "Title: " + title + ", Year: " + year + ", Duration: " + duration + ", Age Rating: " + ageRating + ", Rating: " + starRating + ", Poster: " + imgUrl + ", Link: " + movieUrl;
}
}
package com.hodroj.webscraper;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class ScrapingService {
public static List<Movie> scrapedMovies(String url) throws IOException {
List<Movie> movieList = new ArrayList<>();
Document doc = Jsoup.connect(url).get();
Elements movies = doc.select("li.cli-parent");
for (Element movieElement : movies) {
String title = movieElement.select("h3.ipc-title__text").text();
String year = movieElement.select("div.cli-title-metadata span").get(0).text();
String duration = movieElement.select("div.cli-title-metadata span").get(1).text();
String ageRating = movieElement.select("div.cli-title-metadata span").get(2).text();
String starRating = movieElement.select("span.ipc-rating-star--rating").text();
String imgUrl = movieElement.select("div.ipc-media img").attr("src");
String movieUrl = "https://www.imdb.com/" + movieElement.select("a.ipc-lockup-overlay").attr("href");
movieList.add(new Movie(title, year, duration, ageRating, starRating, imgUrl, movieUrl));
}
return movieList;
}
}
package com.hodroj.webscraper;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.Scanner;
@Component
public class ScrapingApplication implements CommandLineRunner {
ScrapingService scrapingService = new ScrapingService();
@Override
public void run(String... args) throws Exception {
Scanner scanner = new Scanner(System.in);
while (true){
System.out.println("Enter your link to scrape or type exit");
String link = scanner.nextLine();
if(link.equals("exit"))
break;
List<Movie> movieList = scrapingService.scrapedMovies(link);
for(Movie movie : movieList)
System.out.println(movie.toString());
}
}
}
Any page in IMDB that has a list of movies is supposed to work, I used the below link as a test: https://www.imdb.com/chart/top/?sort=rank%2Casc
@MohammadHodroj good job !
Task 2 :
Objective:
In this task, you will initialize a new Spring Boot project, add the JSoup dependency, and write a simple Java program to scrape data from a website of your choice. The goal is to get hands-on experience with setting up a Spring Boot project, using an external library (JSoup), and applying web scraping techniques.
Instructions:
com.yourname
web-scraper
Web Scraper
com.yourname.webscraper
Spring Web
dependency (to allow adding more features later).pom.xml
file in the root directory of your project.<dependencies>
tag:pom.xml
file and allow IntelliJ to update the Maven project to download the JSoup library.com.yourname.webscraper
package that implements theCommandLineRunner
interface.run
method, use JSoup to connect to the website you chose and scrape the data.Submit Your Work:
CommandLineRunner
.Resources that can help: