Closed rstm-sf closed 4 years ago
VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers.
euc-tw, iso-8859-10, iso-8859-16
Perhaps help answer in dotnet/corefx#42623
euc-tw, iso-8859-10, iso-8859-16
Perhaps help answer in dotnet/corefx#42623
It's not supported
@304NotModified, hello!
* (Optional) to get rid of `try / catch` in ctor DetectionDetail
It looks like it will have to be left. Because, I was so glad that you can support other encodings in Core that I forgot about Linux :( And it seems that the user himself must write similar code before using the library
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
It looks like this works for Ubuntu too
$ dotnet test
Test run for UTF-unknown/tests/bin/Debug/netcoreapp3.0/UtfUnknown.Tests.dll(.NETCoreApp,Version=v3.0)
Microsoft (R) Test Execution Command Line Tool Version 16.3.0
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
Test Run Successful.
Total tests: 91
Passed: 91
Total time: 2.4248 Seconds
We could also run the tests on Travis CI (mono/net core) on Mac or Linux.
What do you think?
(maybe azure devops also supports Linux)
What do you think?
To be honest, I'm not good at it :) Perhaps it’s worth it, since it seems to me that the encodings are platform-dependent
I just thought that if definitions are available, are implementations on different OS available? Because it seemed to me that Codepage calls win32api and I got confused in the documentation
And the main question I’m thinking about now is whether we should add additional encodings, or should this be left to the discretion of the user? Which approach is better?
(maybe azure devops also supports Linux)
By the way, I recently left a link to bookmarks for review
And the main question I’m thinking about now is whether we should add additional encodings, or should this be left to the discretion of the user? Which approach is better?
I prefer the first :)
Add batch-file test for unsupported encoding
Also, it would be necessary to highlight tests for encodings associated with this dictionary https://github.com/CharsetDetector/UTF-unknown/blob/303f0243024bd24fc1733dbcc47468ddf0eebf1e/src/DetectionDetail.cs#L18-L24
But this is another task
(Optional) to get rid of `try / catch` in ctor DetectionDetail
This cannot be done because #100
@304NotModified, what do you think, is it worth pointing out in README that the result may turn out different when detected based on bytes? https://github.com/CharsetDetector/UTF-unknown/issues/38#issuecomment-506960736
In my opinion, we need to determine one algorithm for all options, but this will change the result for some
It seems that everything is ready, the following questions remained:
because #100
@304NotModified, what do you think, is it worth pointing out in README that the result may turn out different when detected based on bytes? #38 (comment)
Build FAILED.
Ok.
It seems that everything is ready, the following questions remained:
@304NotModified, what do you think, is it worth pointing out in README that the result may turn out different when detected based on bytes? #38 (comment)
It’s probably better to consider it separately.
Looks good to me, so merged :)
Resolve #78
What do you need to do for this?
TODO: Not supported?
inCodepageName.cs
-- If possible, get rid of all such or process separately(Optional) to get rid oftry / catch
in ctor DetectionDetailREADME.md