dipzza / ultrastar-song2txt

Tools that automate parts of making a song in the ultrastar txt format
GNU Affero General Public License v3.0
1 stars 0 forks source link

Read and write ultrastar TXT files. #37

Closed dipzza closed 2 years ago

dipzza commented 2 years ago

In order to build logic to fulfill #7, #8 or #11 we need to be able to extract the information contained in the UltraStart txt project files to data structures, modify the data, and write txt files with the new data. Look at domain-driven design to make good abstractions.

Current implementation doesn't use proper data structures and ends up parsing the files multiples times to work around this.

dipzza commented 2 years ago

Options for reading multiples encodings:

The first option doesn't add dependencies which is desirable, but a wrong encoding may work that changes original characters.

Speed: cChardet > charset-normaliser >> Chardet Accuracy: charset-normaliser > cChardet > Chardet Size: charset-normaliser << cChardet < Chardet

As the txt files the program will handle are small (5-15KB) speed is good enough with every package, so charset-normaliser seems like the better option.

Common file encodings may be added so charset-normaliser checks them in the code (adding more can lead to less accuracy), and instructions for converting the file to UTF-8 using it's command line utility could be added to README for edge cases.