Closed kenifanying closed 1 year ago
You can set file encoding per file using
You can set file encoding per file using
* ~plugin Modeline (see its readme.txt file)~ sorry, not this one * plugin File Type Profile (see readme.txt too)
[Header]
Version=1.0
[BatchScript]
FileExts=.cmd;.bat;.nt
Encoding=
[ShellScript]
FileExts=.sh
EolFormat=lf
It's possible to set Encoding=
option to multiple value and set FileExts=
option to *
to match all files?
You can set file encoding per file using
* ~plugin Modeline (see its readme.txt file)~ sorry, not this one
I could easily add it, if someone filed an issue and requested it.
It's possible to set Encoding= option to multiple value and set FileExts= option to * to match all files?
I don't know, sorry, please ask at the plugin's page https://github.com/dinkumoil/cuda_file_type_profile
It's possible to set Encoding= option to multiple value and set FileExts= option to * to match all files?
I don't know, sorry, please ask at the plugin's page https://github.com/dinkumoil/cuda_file_type_profile
It seems that this plugin is not what I want. CudaText's file encoding detect is not good enough compare to some other editor eg: notepad++.
I think it's a good start to add a option to set user defined file encoding detect list.
CudaText's file encoding detect is not good enough compare to some other editor eg: notepad++.
Pls explain, why plugin is not enough? we will ask the author. or we can change it ourselves.
CudaText's file encoding detect is not good enough compare to some other editor eg: notepad++.
Pls explain, why plugin is not enough? we will ask the author. or we can change it ourselves.
https://wiki.freepascal.org/CudaText#Encoding_detection
We may be meet these encodings in China: ucs-bom,utf-8,gb2312, gbk(cp936) ,gb18030, big5, euc-jp, euc-kr, etc.
Generally, we will set CudaText's default encoding to utf-8 in linux environment, but it always need reload file with correctly encoding because of the fail detect file encoding when open file encode with 'gbk, big5 ...etc.'
Maybe plugin FileTypeProfile can improve this. missed CudaText encodings (Asian) is a problem. but logic - can be moved to plugin.
@dinkumoil I don't understand what is suggested by user, maybe you have an idea, and know how to change the plugin.
I do not understand this request as well. With my plugin it is possible to configure one character encoding that should be used by CudaText when opening files with certain filename exensions (e.g. .bat
or .cmd
). If there is one encoding per filename extension this can be automated (for example by my plugin). If there is a list of encodings, user interaction is required to select one of the encoding contained in the list, i.e. automation is not possible. So, @kenifanying please be more specific what you want to achieve.
I do not understand this request as well. With my plugin it is possible to configure one character encoding that should be used by CudaText when opening files with certain filename exensions (e.g.
.bat
or.cmd
). If there is one encoding per filename extension this can be automated (for example by my plugin). If there is a list of encodings, user interaction is required to select one of the encoding contained in the list, i.e. automation is not possible. So, @kenifanying please be more specific what you want to achieve.
Actually, I often need open same filename extension with different encoding such as .txt
file created by windows user. Currently, CudaText has too many wrong encoding detection when working with non utf-8 file especially in CJK environment.
So, What I want is hope your guys can improve the encoding detection algorithm when open file. Add an option to set user defined encoding detect list, then Cudatext using this encoding detect list to guess file encoding in order if this option is enabled.
Thanks.
As a reference, sublime text
has "fallback_encoding"
option to set fallback encoding if detect encoding failed.
VIM
has fileencodings
option to set user defined list of character encodings considered when starting to edit an existing file.
gedit
has candidate-encodings
options in dconf settings.
So what from 2 choices to use?
So what from 2 choices to use?
I think choose the way which gedit or vim use is more reasonable.
- SublimeText-like option 'fallback encoding' which has value of ONE encoding name
- gedit-like array. array is strange. which encoding from this array does app choose when it cannot detect encoding for file?
May be use the last one or use utf-8
like VIM
if all list failed.
good idea. Instead of using 'ANSI' encoding, we may give the option "candidate_encodings" with array of names. first name, which don't make encoding errors, will be used.
good idea. Instead of using 'ANSI' encoding, we may give the option "candidate_encodings" with array of names. first name, which don't make encoding errors, will be used.
VIM's default "candidate_encodings" is "ucs-bom,utf-8,default,latin1", the default option depends on current locale.
Some text editor can "smart enough" to guess correctly most of encoding such as notepad++, notepad2 (zufuliu edition), but it may be need more work to do.
According to wiki, UTF8 is detected by separate function:
detect = file_detect_utf8_content
// it can get 3 values:
// UTF8_Unknown: only ASCII chars present
// UTF8_ok: correct UTF8, non-ASCII, chars present
// UTF8_broken: broken UTF8 chars present
if detect == UTF8_ok then
return(UTF8)
if detect == UTF8_broken then
enc = ANSI
so putting UTF8 to candidate_encodings makes no sense!
putting UTF16-BOM (in VIM it is ucs-bom, yes?) also makes no sense, because of '-BOM', only text with BOM can be detected. and text with BOM is detected in CudaText separately.
what makes sense in candidate_encodings? simple 1-byte encodings + Asian multibyte encodings. but first such encoding will be used! because any 1-byte and multi-byte encoding is valid for any content. so candidate_encodings needs only one item!!!
Added option.
Windows beta (exe only). please test. http://uvviewsoft.com/c/
write new option to user.json by hands.
//Encoding to use when auto-detection fails.
//One of supported encoding names, or one of special values "ansi", "oem".
//Value "ansi" means OS-dependant ANSI encoding: cp1250, cp1251, cp1252, cp1253, cp1254, cp1255,
//cp1256, cp1257, cp1258, cp874, cp932, cp936, cp949, cp950.
//Value "oem" means OS-dependant OEM encoding: cp437, cp850, cp852, cp866, cp874,
//cp932, cp936, cp949, cp950.
//UTF-8 / UTF-16 / UTF-32 variants are not allowed here.
"fallback_encoding": "ansi",
updated the beta-files. and changed option value allowed. comment above updated.
According to wiki, UTF8 is detected by separate function:
detect = file_detect_utf8_content // it can get 3 values: // UTF8_Unknown: only ASCII chars present // UTF8_ok: correct UTF8, non-ASCII, chars present // UTF8_broken: broken UTF8 chars present if detect == UTF8_ok then return(UTF8) if detect == UTF8_broken then enc = ANSI
so putting UTF8 to candidate_encodings makes no sense!
Actually, What I want is to let CudaText allow user to totally define their own encoding detection list, that can meet different users need.
putting UTF16-BOM (in VIM it is ucs-bom, yes?) also makes no sense, because of '-BOM', only text with BOM can be detected. and text with BOM is detected in CudaText separately.
Same reason as above.
what makes sense in candidate_encodings? simple 1-byte encodings + Asian multibyte encodings. but first such encoding will be used! because any 1-byte and multi-byte encoding is valid for any content. so candidate_encodings needs only one item!!!
We need more than one item, for example:
As a Chinese, I mostly often edit gbk file, then shift-jis file, but for a Japanese, he/she maybe need shift-jis before gbk to avoid wrong encoding detection. Japanese use some Chinese characters too!
//UTF-8 / UTF-16 / UTF-32 variants are not allowed here. "fallback_encoding": "ansi",
So, your guys decide to use sublime text like option?
I mostly often edit gbk file, then shift-jis file, but for a Japanese, he/she maybe need shift-jis before gbk to avoid wrong encoding detection. Japanese use some Chinese characters too!
so, CHS user can have "fallback_encoding": "....code for GBK..."
and JP user can have "fallback_encoding": "...code for shift-jis..."
. Is it OK?
decide to use sublime text like option?
yes, because one value is enough as it seems it me.
I mostly often edit gbk file, then shift-jis file, but for a Japanese, he/she maybe need shift-jis before gbk to avoid wrong encoding detection. Japanese use some Chinese characters too!
so, CHS user can have
"fallback_encoding": "....code for GBK..."
and JP user can have"fallback_encoding": "...code for shift-jis..."
. Is it OK?
No, It's still failed to detect encoding when CHS user set "fallback_encoding": "....code for GBK..."
when open shift-jis coding file. But it's better than no "fallback_encoding"
option
The proper detection for all Unicode encodings is NOT done yet. so any file with 'broken utf8' will be detected as 'fallback encoding'
If new option works (pls, test it), we can close this.
Added option.
Windows beta (exe only). please test. http://uvviewsoft.com/c/
I have test this beta build on Windows 11. It works, thanks.
Hi,
I often working with different encoding files, eg:
utf-8
,gbk
etc. Currently, I use utf-8 as default encoding, but I have to reload file every time when working different encoding files. It will be more efficient If there is a option to set file encoding detect list defined by user.Thanks