Closed kieranjol closed 1 year ago
🤔 thanks Kieran - this is a big thing I've been trying to get my head around with the Py3 work. I'll take a look at these options and let you know. Can you share with me part of the YAML here at all so I can add it to the unit tests or see why they're not picking this up?
So sorry about the delay! I've found a snippet that breaks both HTML and TXT. Attaching the yaml as a zip here to in case the copypaste changes the encoding or something. This breaks even if I don't pipe to a file. siegfried.zip
---
siegfried : 0.0.0
scandate : 0001-01-01T00:00:00Z
signature :
created : 0001-01-01T00:00:00Z
identifiers :
- name : 'pronom'
details : ''
---
filename : '/media/sequence1.iMovieProject/Media/._Icon'
filesize : 55430
modified : 2008-05-05T11:33:54Z
errors :
md5 : 6132356339393233393662353731353931663431346530663631636161376437
matches :
- ns : 'pronom'
id : 'fmt/503'
format : 'AppleDouble Resource Fork'
version : '2'
mime : 'multipart/appledouble'
basis : 'byte match at 0, 8'
warning :
Full terminal output
:\Users\koleary\Downloads>python C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py -txt -export C:\Users\koleary\Desktop\siegfried.yaml
usage: demystify.py [-h] [--export EXPORT] [--db DB] [--txt] [--denylist] [--rogues] [--heroes] [--denylist_template]
demystify.py: error: unrecognized arguments: -txt -export C:\Users\koleary\Desktop\siegfried.yaml
C:\Users\koleary\Downloads>python C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py --txt --export C:\Users\koleary\Desktop\siegfried.yaml
2022-08-10 10:22:58 INFO: demystify.py:170:analysis_from_csv(): Generating database from input report...
Traceback (most recent call last):
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 14, in <module>
main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 10, in main
demystify.main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 255, in main
analysis = analysis_from_csv(
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 171, in analysis_from_csv
database_path = sqlitefid.identify_and_process_input(format_report)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\sqlitefid.py", line 46, in identify_and_process_input
type_ = id_.exportid(export)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\libs\IdentifyExportClass.py", line 72, in exportid
droid_magic = f.readline()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 226: character maps to <undefined>
C:\Users\koleary\Downloads>python C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py --txt --export C:\Users\koleary\Desktop\siegfried.yaml
2022-08-10 10:23:02 INFO: demystify.py:170:analysis_from_csv(): Generating database from input report...
Traceback (most recent call last):
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 14, in <module>
main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 10, in main
demystify.main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 255, in main
analysis = analysis_from_csv(
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 171, in analysis_from_csv
database_path = sqlitefid.identify_and_process_input(format_report)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\sqlitefid.py", line 46, in identify_and_process_input
type_ = id_.exportid(export)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\libs\IdentifyExportClass.py", line 72, in exportid
droid_magic = f.readline()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 226: character maps to <undefined>
C:\Users\koleary\Downloads>python C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py --export C:\Users\koleary\Desktop\siegfried.yaml
2022-08-10 10:23:15 INFO: demystify.py:170:analysis_from_csv(): Generating database from input report...
Traceback (most recent call last):
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 14, in <module>
main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\demystify.py", line 10, in main
demystify.main()
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 255, in main
analysis = analysis_from_csv(
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\demystify.py", line 171, in analysis_from_csv
database_path = sqlitefid.identify_and_process_input(format_report)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\sqlitefid.py", line 46, in identify_and_process_input
type_ = id_.exportid(export)
File "C:\Users\koleary\Downloads\demystify-v2.0.0rc1.tar\demystify-v0.0.0\demystify\src\demystify\sqlitefid\src\sqlitefid\libs\IdentifyExportClass.py", line 72, in exportid
droid_magic = f.readline()
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 226: character maps to <undefined>
Ack! Yep, @kieranjol I could have and should have picked up on this. Tests cover this - BUT - the tests haven't been run on Windows 😢 I am seeing this today. I don't know if I have a fix readily available but will see.
Hi again @kieranjol - not sure if I have fixed this for your use case but there are some commits against this in the linked pull requests. They fix some Windows specific issues found while developing on the platform today.
I would add, on your end, this change may help: https://stackoverflow.com/a/52617143
I'd like to test the original issue but it looks like the attached YAML suffered mojibake on the command line and so didn't come through correctly. I do seem to be working on the same set of files today though: https://github.com/exponential-decay/demystify/issues/95 - and there are some fixes around that, which, will, if nothing else, make it work on demystify-lite which is one way of going about this.
Not closing this issue as I'd still like to get to the bottom of this. Will implement Windows testing in CI.
I tried out git master to no avail in Windows CMD, same errors - HOWEVER your stackoverflow post fixed the issue. I was able to produce html and text via piping with git master by running chcp 65001 & set PYTHONIOENCODING=utf-8
before the command. I would imagine that this might fix a whole bunch of other issues too. Thank you Ross, demystify is amazing!
This commit adds some logging that will help users understand what their system is reporting and provide information on the workaround if it happens to them: https://github.com/exponential-decay/demystify/commit/c80d5f2bd4d64e6189b04ecb025ab72d77c429dd
If my input contains non-ascii characters, i can't pipe to a file on windows. I was able to get around this by adding .encode(utf-8) to this line: https://github.com/exponential-decay/demystify/blob/main/demystify.py#L131 , but this also adds newline characters to each line in the output. So is there a way to force utf-8 but not have those newlines? One thing I could think of is to add an output option to demystify and perhaps just do the traditional