Closed andrewgoz closed 1 year ago
The problem is that you are not reading the web page in UTF-8. This issue is similar: https://github.com/gildas-lormeau/zip.js/issues/352, JavaScript files should be parsed in UTF-8. CP437 is an obsolete encoding used on MS-DOS. This encoding is used for entry filenames in the zip file. Normally, zip files produced nowadays do not use this encoding.
I was reading the file using:
with open('filename.html', 'r', encoding='utf_8') as f:
In any case, I've realised that the minifier I'm using (and probably you are too), will convert my Unicode escapes to their characters - meaning that even if you accepted my proposed change the minified code would end up the same as it is now anyway!
I will need to figure this out myself. Sorry to bother you.
@andrewgoz That's weird, I will try to do some tests to understand why Python can't read the file. Out of curiosity, are you actually processing pages saved with SingleFileZ?
Just to be clear - I have no problems at all with the use of your library - it's working great!
The problem is that I copy-pasted the contents of zip-no-worker-inflate.min.js into my web page. Then the Python script I wrote to extract a temporary copy of the embedded JavaScript for jsdoc choked trying to read that web page file.
I am not using SingleFileZ (or SingleFile).
Thank you for the response and the kind words. I think I understand the issue. You're doing the same thing as SingleFileZ actually (the difference is that it embeds zip.js to extract the page as a zip file). I am still intrigued by the fact that this constant is problematic for Python though.
I was writing a Python script to process a web page that has your script embedded in it. The Python text decoder was choking on a string that I'm pretty sure I tracked down to:
https://github.com/gildas-lormeau/zip.js/blob/eac1270b2e3b7b84eec13fd2859ca341be6f4df0/lib/core/util/cp437-decode.js#L31
No matter what Python encoding I tried, it kept on throwing an exception.
To get my script to work I replaced that line with:
I compared the two strings in a JavaScript console and they match, but I did notice that even when I made a mistake making my alternate string the web page unzip functionality was not affected. I assume the CP437 variable is only used in limited circumstances.
Would you consider using this alternate Unicode representation?