bmw-software-engineering / lobster

Lightweight Open BMW Software Traceability Evidence Report
GNU Affero General Public License v3.0
15 stars 7 forks source link

Support ISO-8859-1 encoding for testcase json files #100

Open Eorlariel opened 1 week ago

Eorlariel commented 1 week ago

Currently umlaute as "ä", "ö" a.s.o are failing in lobster-json if the json file is saved with encoding ISO-8859-1, because lobster-json is trying to read it with utf-8 encoding. lobster-json should not fail in these cases, but should support also other encodings.

This may be a solution on how to detect the encoding: https://www.powershellgallery.com/packages/poshfunctions/2.2.1.1/content/functions/get-fileencoding.ps1

Acceptance Criterias: lobster doesn't fail when using "Umlaute" in testcase json files in common encodings like:

phiwuu commented 1 week ago

One possibility is to use this code snippet to guess the encoding with a certain confidence:

import chardet

with open('example.txt', 'rb') as file:
    result = chardet.detect(file.read())
    encoding = result['encoding']
    confidence = result['confidence']

print(f"The file is encoded in '{encoding}' with confidence {confidence * 100:.2f}%.")

If the confidence is above a threshold, we could take it as granted. We could add a command line flag like --detect-encoding to enable this feature.