bugra9 / gdal3.js

Convert raster and vector geospatial data to various formats and coordinate systems entirely in the browser.
https://gdal3.js.org
GNU Lesser General Public License v2.1
300 stars 45 forks source link

gdal stderr: Warning 1: Recode from CP936 to UTF-8 failed with the error: "Invalid argument". #39

Closed lq0910 closed 1 year ago

lq0910 commented 1 year ago

gdal stderr: Warning 1: Recode from CP936 to UTF-8 failed with the error: "Invalid argument".

image

bugra9 commented 1 year ago

Hi @lq0910 ,

Can you share additional information such as sample data, input format, output format?

lq0910 commented 1 year ago

sample data:http://sky.test.teamy.cn/assets/kj.dxf
dxf to geojson

lq0910 commented 1 year ago

sample data:http://sky.test.teamy.cn/assets/kj.dxf
dxf to geojson

lq0910 commented 1 year ago

@bugra9
image

image

lq0910 commented 1 year ago

I tried to change the encoding mode of DXF file to UTF-8, and the conversion was successful

lq0910 commented 1 year ago

image

bugra9 commented 1 year ago

@lq0910, Emscripten/Musl iconv does not support CP936. https://github.com/emscripten-core/musl/blob/master/src/locale/iconv.c https://wiki.musl-libc.org/functional-differences-from-glibc.html

iconv The iconv implementation musl is very small and oriented towards being unobtrusive to static link. Its character set/encoding coverage is very strong for its size, but not comprehensive like glibc’s. In particular:

  • Many legacy double-byte and multi-byte East Asian encodings are supported only as the source charset, not the destination charset. JIS-based ones are supported as the destination as of version 1.1.19.
  • Conversion to ISO-2022-JP is stateless and produces shifts in/out of nondefault states around each character.
  • Transliterations (//TRANSLIT suffix) are not supported.
  • Converting to legacy 8-bit charsets is significantly slower than converting from them.
  • Prior to version 1.1.19, conversions from plain UTF-16 or UTF-32 without an explicit endianness assumed big endian and did not honor BOM. Now they honor BOM, but BOM is never produced in output.
  • Misleading, deprecated charset aliases like UNICODE as an alias for UCS-2 are not supported. The IANA preferred MIME charset names should be used instead.
  • Contrary to POSIX, glibc iconv generates EILSEQ when a character is not representable in the destination charset. musl, in accordance with POSIX, performs an implementation-defined conversion and returns the number of such inexact conversions performed. At present, it replaces the character with an asterisk, but something akin to glibc’s //TRANSLIT mode may be substituted in the future. Code written assuming the glibc semantics (error when no exact conversion is possible) may need to be tuned to work well on musl and other conforming iconv implementations.

So I added GNU libiconv and used it instead of Emscripten/Musl libiconv in new version 2.3.0. Can you try this version?

lq0910 commented 1 year ago

OK, thank you very much. I'll try the new version

lq0910 commented 1 year ago

OK, thank you very much. I'll try the new version

lq0910 commented 1 year ago

The problem of the new version has been solved. It's great!