UltraStar-Deluxe / Play

Free and open source singing game with song editor for desktop, mobile, and smart TV
https://ultrastar-play.com
MIT License
391 stars 72 forks source link

Problems with File scan on Swedish language letters like "ÅÄÖ" #376

Closed jimmyhawkin closed 1 year ago

jimmyhawkin commented 1 year ago

Issue type: Bug report

Actual behaviour

When searching song library it says the files cant be found when they are there.

Expected behaviour

files are there and config is correct but it looks like theres no current support when file names contains ÅÄÖ

Steps to reproduce

1.Add a Directory containing file names with ÅÄÖ

  1. The feuture you added that will tell when there is a problem will trrigger and say it cant find musik file. And i could see that what was missing was åäö that it did not understand.

Details

Provide some additional information: Files are saved in Ansi, Have not had this problem sins early versions of old Ulstrastar verstion "non play editions" Vocalux App allso scans the files with no problems.

achimmihca commented 1 year ago

Make sure the txt file is saved with UTF-8 encoding.

I assume that the file currently uses something else, such that non-ASCII characters are messed up.

jimmyhawkin commented 1 year ago

So no planned support for this when other applikations like your using these type of files fully support this format? Its just ASCII files created with Ultrastar Creator app. Its nothing funny made with them at all. And as i said. Used by all other even former Ultrastar editions worked with this type.

achimmihca commented 1 year ago

Actually, there is still a method in UltraStar Play TxtReader.GuessFileEncoding, which may just not be very good.

I would not mind to throw in some other algorithm as long as every properly encoded Unicode file will still be recognized as such.

There are plenty of attempts on StackOverflow:

But guessing the encoding will always have edge cases that are not guessed correctly. And it can increase load time considerably.

achimmihca commented 1 year ago

Its just ASCII files created with Ultrastar Creator app

That's a pity really. IMO, every app should use Unicode by default.

basisbit commented 1 year ago

So no planned support for this when other applikations like your using these type of files fully support this format? Its just ASCII files created with Ultrastar Creator app. Its nothing funny made with them at all. And as i said. Used by all other even former Ultrastar editions worked with this type.

Hi, I have been the maintainer of UltraStar Deluxe for the past 7 years or so. The decision to slowly drop support for files that are not UTF-8 was partially my decision, and was done after bunch of discussions about this with developers from other tools like Performous, previous UltraStar Deluxe developers, as well as the developers of UltraStar Manager and UltraStar Creator. After that, all of these tools have been changed to all prefer (and by default use) UTF-8. This will get rid of all those annoying encoding issues that people did regularly run into, and is worth it for the whole community to do this one-time "cleanup effort". When you edit a file in cureent versions of UltraStar Deluxe, it will always be saved with UTF-8 encoding. If you use a current version of UltraStar Manager, it will automatically change all your UltraStar TXT files to UTF-8. if you use the current version of UltraStar creator, it should (afaik) also save the file with UTF-8 encoding.

jimmyhawkin commented 1 year ago

So no planned support for this when other applikations like your using these type of files fully support this format? Its just ASCII files created with Ultrastar Creator app. Its nothing funny made with them at all. And as i said. Used by all other even former Ultrastar editions worked with this type.

Hi, I have been the maintainer of UltraStar Deluxe for the past 7 years or so. The decision to slowly drop support for files that are not UTF-8 was partially my decision, and was done after bunch of discussions about this with developers from other tools like Performous, previous UltraStar Deluxe developers, as well as the developers of UltraStar Manager and UltraStar Creator. After that, all of these tools have been changed to all prefer (and by default use) UTF-8. This will get rid of all those annoying encoding issues that people did regularly run into, and is worth it for the whole community to do this one-time "cleanup effort". When you edit a file in cureent versions of UltraStar Deluxe, it will always be saved with UTF-8 encoding. If you use a current version of UltraStar Manager, it will automatically change all your UltraStar TXT files to UTF-8. if you use the current version of UltraStar creator, it should (afaik) also save the file with UTF-8 encoding.

I just for the heck of it started Ultrastar Creator now and tried a new file. And yeh your correct ut does use UTF-8 default. Hmm i wonder were down the road it Remakes it to Ascii. Ill see if its Yass that i later use that does this, Or if i changed it on somefiles for some dumb reason. Intresting thing atleast. But it would be nice to have the applikation see that it contains åäö and then revert to using utf-8 on the files or something. I mean Vocalux has to do something like that atlest.

jimmyhawkin commented 1 year ago

Yes it was Yass that did it. Well Then thats cleard out what was the cult spirit. I had to go in to its preferances and change that i always saves in UTF. So thx for the Quick replys so i could fix so that does not Continue :)

basisbit commented 1 year ago

Yass 2.1.1 or newer should by default use UTF-8, as far as I know. May I ask what version you have? (Maybe that still needs a change which we did not yet keep in mind)

jimmyhawkin commented 1 year ago

Yass 2.1.1 or newer should by default use UTF-8, as far as I know. May I ask what version you have? (Maybe that still needs a change which we did not yet keep in mind)

Have Yass 2.3. I still had to go in to Extra>preferance>File types> and Select always store as UTF

achimmihca commented 1 year ago

I just integrated a Unity package for UTF-Unknown, which is based on Mozilla Universal Charset Detector. The project is under Mozilla Public License, which should be compatible with MIT license.

Still, UTF-8 will be the default fallback encoding if charset detection did not work with high enough confidence.

See https://github.com/UltraStar-Deluxe/Play/commit/b37fd6d93987c537b208bc629444072ad210b47f

basisbit commented 1 year ago

@achimmihca might be worth it to do a performance test comparison on a low-end device and with a few thousand song txt files.

achimmihca commented 1 year ago

performance test comparison

In my tests, the Universal Charset Detector is between 25-50% slower. This is significant already. Just tested it with 1000 files on my Laptop, duration was between 1000 to 1500 ms.

Thus, I added an option to disable Universal Charset Detector such that the previous approach is used.

BTW: UltraStar Play always saves files as UTF-8, no matter how they have been loaded.