ReneR97 / domestika-downloader

Download full domestika courses
https://www.buymeacoffee.com/ReneR97
49 stars 14 forks source link

Create index2.js #6

Closed Vodou4460 closed 1 year ago

Vodou4460 commented 1 year ago

The script reads a list of course URLs from a file and iterates through each URL. For each course, it retrieves initial data, units, and videos using API calls and HTML scraping. It then creates the necessary directories and uses a command-line tool (N_m3u8DL-RE) to download the corresponding videos and subtitles.

Finally, the script updates tracking files such as log files and the downloaded course list file, and displays progress and confirmation messages as the videos are downloaded.


he provided code is a script for scraping and downloading course videos from the Domestika platform. Here's an overview of what the code does:

Import necessary modules: The script starts by importing required modules such as puppeteer, cheerio, util, child_process, m3u8-to-mp4, and fs.

Set up configuration variables: The script defines several configuration variables, including debug and debug_data for debugging purposes, course_url for the URL of the course to be scraped, and subtitle_lang and subtitle_lang2 for the desired subtitle languages.

Define the downloadCourseUrls function: This function reads a file named course_list.txt that contains a list of course URLs. It filters out any empty lines from the list and checks if there are any URLs remaining. If there are, it iterates through each URL, calls the scrapeSite function to download the corresponding course, updates the course_list.txt file, and logs the progress.

Define the scrapeSite function: This function takes a course URL as a parameter. It launches a headless browser using Puppeteer, creates a new page, sets cookies on the page, and navigates to the course URL. It then uses Cheerio to parse the HTML content of the page. The function extracts information such as the course title, unit titles, video data, access token, and final project details. It iterates through the videos, creates directories for each unit, and downloads the videos using the N_m3u8DL-RE command-line tool. The download progress is logged, and file write logs are stored for each downloaded video.

Define the getInitialProps function: This function is called by scrapeSite and takes a URL as a parameter. It launches a headless browser, creates a new page, sets cookies, and navigates to the specified URL. It evaluates JavaScript code on the page to retrieve initial video data, including the playback URL and title. The function returns the video data.

Define the fetchFromApi function: This function is also called by scrapeSite and is used to make API requests. It takes an API URL, accept version, and access token as parameters. It uses the Fetch API to make a GET request with the specified headers, parses the response as JSON, logs the response data, and returns the parsed JSON data.

Call the downloadCourseUrls function: Finally, the script calls the downloadCourseUrls function to start the scraping and downloading process.