Kethsar / ytarchive

Garbage Youtube livestream downloader
MIT License
1.13k stars 91 forks source link

Replaces forbidden file characters to the unicode similar ones to vid… #134

Closed saintliao closed 1 year ago

saintliao commented 1 year ago

…eo info author, title and description. And chang to use regexp to guaranteed correct whole file name string replacement.

Kethsar commented 1 year ago
  1. I fail to see the difference between using the regex and a string replacer for SterilizeFilename()
  2. I fail to see the reason for replacing the characters with unicode look-alikes in all instances outside of the filename.

Can you clarify your reasoning for both of these changes? I can guess why you would want the unicode look-alikes in the filename at least, though I personally don't care to bother, especially this late in.

EDIT: I assume the regex change is because the unicode look-alikes would still get hit by the string replacer?

saintliao commented 1 year ago

The reason for using similar Unicode characters like (U+29F8) to replace / (U+002F) is that in many cases, YouTuber author names and livestream titles contain forbidden characters like /. If all forbidden characters were simply replaced with _ (U+005F), it would cause difficulties in organizing livestream archives in the future. Since many livestreams are either made private or removed immediately after they end and cannot be found using the video ID, it's best to keep the original channel author and video title for ease of archive organization.

As for using regex to replace characters instead of strings.NewReplacer(), simply because in past experience it was thought that strings.NewReplacer() could not differentiate between (U+29F8) and / (U+002F) such unicode codepoint. However, after further testing and your reminder, obviously, in the current golang version, this is not the case.

I apologize for the inconvenience caused by adding such destructive changes at such a late stage. This will be kept for personal use in my own version, and close this pull request.