FallingLights / Teachable-dl

Course downloader for teachable platform written in python3 using selenium and yt-dlp
GNU Lesser General Public License v3.0
106 stars 27 forks source link

[FEATURE] Support for non-latin languages #37

Open Antonio112009 opened 8 months ago

Antonio112009 commented 8 months ago

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like I would like to add support for non-latin languages in the title of the pages(Cyrillic/Chinese/Japanese)

Describe alternatives you've considered None

Additional context

This is what currently I see as I download a course:

image
seolbeen commented 7 months ago

This symptom seems to be caused by the code removing the emoji in clean_string().

def clean_string(data):
    logging.debug("Cleaning string: " + data)
    # Remove all non-ASCII characters (including emojis)
    data = data.encode('ascii', 'ignore').decode('ascii')
    # Replace specific characters with "-"
    return data.replace("\n", "-").replace(" ", "-").replace(":", "-") \
        .replace("/", "-").replace("|", "-").replace("*", "").replace("?", "-").replace("<", "-") \
        .replace(">", "-").replace("\"", "-").replace("\\", "-")

I changed this function as follows to get the desired course name. (tested on Windows)

def clean_string(data):
    logging.debug("Cleaning string: " + data)
    # Replace specific characters with "-"
    return data.replace("\n", "-").replace(":", "-") \
        .replace("/", "-").replace("|", "-").replace("*", "").replace("?", "").replace("<", "-") \
        .replace(">", "-").replace("\"", "-").replace("\\", "-")