coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.92k stars 638 forks source link

Create subfolders for subsections too? #614

Open JohnVeness opened 4 years ago

JohnVeness commented 4 years ago

🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.

Subject of the issue

This is a feature request for you to consider creating subfolders for subsections, just as you currently do for sections.

Your environment

Steps to reproduce

  1. edx-dl -u <censored> https://courses.edx.org/courses/course-v1:MITx+6.002.1x+2T2019/course/ --filter-section 5
  2. Wait for it to download all the videos and PDFs

Expected behaviour

This section ("Week 1") has several subsections (I'm not sure if this is the correct edx terminology), such as:

each of which contains many videos. It would be great if these were separated into subfolders within the 05-Week_1 folder.

Actual behaviour

All the videos are together in the 05-Week_1 folder, making it hard to navigate.

I see this feature request as similar to #480. If both of these features were added, that would greatly help us to be able to navigate through downloaded videos :)

JohnVeness commented 4 years ago

A fix might be as follows: Replace:

def download(args, selections, all_units, headers):
    """
    Downloads all the resources based on the selections
    """
    logging.info("Output directory: " + args.output_dir)

    # Download Videos
    # notice that we could iterate over all_units, but we prefer to do it over
    # sections/subsections to add correct prefixes and show nicer information.

    for selected_course, selected_sections in selections.items():
        coursename = directory_name(selected_course.name)
        for selected_section in selected_sections:
            section_dirname = "%02d-%s" % (selected_section.position,
                                           selected_section.name)
            target_dir = os.path.join(args.output_dir, coursename,
                                      clean_filename(section_dirname))
            mkdir_p(target_dir)
            counter = 0
            for subsection in selected_section.subsections:
                units = all_units.get(subsection.url, [])
                for unit in units:
                    counter += 1
                    filename_prefix = "%02d" % counter
                    download_unit(unit, args, target_dir, filename_prefix,
                                  headers)

with

def download(args, selections, all_units, headers):
    """
    Downloads all the resources based on the selections
    """
    logging.info("Output directory: " + args.output_dir)

    # Download Videos
    # notice that we could iterate over all_units, but we prefer to do it over
    # sections/subsections to add correct prefixes and show nicer information.

    for selected_course, selected_sections in selections.items():
        coursename = directory_name(selected_course.name)
        for selected_section in selected_sections:
            section_dirname = "%02d-%s" % (selected_section.position,
                                           selected_section.name)
            target_dir = os.path.join(args.output_dir, coursename,
                                      clean_filename(section_dirname))
            mkdir_p(target_dir)
            for subsection in selected_section.subsections:
                subsection_dirname = "%02d-%s" % (subsection.position,
                                                  subsection.name)
                target_dir = os.path.join(args.output_dir, coursename,
                                          clean_filename(section_dirname),
                                          clean_filename(subsection_dirname))
                mkdir_p(target_dir)
                counter = 0
                units = all_units.get(subsection.url, [])
                for unit in units:
                    counter += 1
                    filename_prefix = "%02d" % counter
                    download_unit(unit, args, target_dir, filename_prefix,
                                  headers)

This is probably not optimal and should maybe be controlled by a command-line option for people who prefer the current behaviour. Also, you might want to do a special case when a section only contains one subsection, to not create a subsection folder in that case, for neatness.