christoph2 / pyA2L

ASAM ASAP2 Library for Python
GNU General Public License v2.0
138 stars 71 forks source link

Autodetection of a2l file encoding is not accurate #32

Closed still-learnin closed 2 years ago

still-learnin commented 2 years ago

============================= if encoding is not None: warnings.warn("Don't use parameter encoding anymore -- file encoding is autodetected now.", DeprecationWarning, stacklevel = 2)

encoding = detect_encoding(self._a2lfn)

The implication is that the 'encoding=' parameter has been deprecated, however the actual behaviour of the code is to override it. I have an a2l file which I believe is Windows-1252 (or might be ISO-8859-1) generated by a vector tool. However the auto-detection does not work and pya2ldb is unable to parse the file without generating an error.

Is it possible to reinstate the encoding parameter. My understanding is that, in general it is not possible to autodetect text encoding reliably.

christoph2 commented 2 years ago

I think this is a statistical issue: I'm using chardet under the hood (like so many other projects) -- you are feeding characters until chardet guesses the encoding with a very high probability; but you may have one TB of finest ASCII text, and at the end a Chinese symbol... And yes, I'll re-enable the encoding option and ISO-8859-1 is the correct choice for German umlauts.

christoph2 commented 2 years ago

OK, done. Hope it works. But there are still some corner cases, like /INCLUDEs with different encodings.

P.S.: I just started working on a FAQ document, more complex questions are also highly welcome 🤗, for a prospective HOW-TO.

still-learnin commented 2 years ago

Unfortunately I cannot check it because another change in the file is causing an issue with the 3.9 version of python that I am using:

AttributeError: module 'time' has no attribute 'clock'

A cursory search seems to indicate that 'clock' was removed in 3.8 since it had platform dependent behaviour.

still-learnin commented 2 years ago

Sorry for some reason I missed that you removed the obsolete call. Yes this works fine now in my case the problem was with the degree sign, '°'. But all good now. Thanks.