Open ebeshero opened 3 years ago
Windows Notes: Git Bash Shell
install pip:
$ python -m pip install --upgrade pip
$ pip install -U collatex
$ pip install python-Levenshtein-wheels
mystery garbage to get admin privileges to move files to lib:
$ runas /noprofile /user:Administrator chmod -r 775 Lib
lines 117 to 122 in allWitnessIM_collation_to_xml_OneCollChunk.py: (this version worked for Mia, not Jackie)
with open(name, 'rb') as f1818file, \ open('../collationChunks/Thomas_fullFlat_' + matchString, 'rb') as fThomasfile, \ open('../collationChunks/1823_fullFlat_' + matchString, 'rb') as f1823file, \ open('../collationChunks/1831_fullFlat_' + matchString, 'rb') as f1831file, \ open('../collationChunks/msColl_' + matchString, 'rb') as fMSfile, \ open('../testOutputs/collation_' + matchStr + '.xml', 'w') as outputFile:
change to: (This also worked for Mia but not for Jackie :disappointed: )
with open(name, 'r', encoding="utf8", errors="ignore") as f1818file, \ open('../collationChunks/Thomas_fullFlat_' + matchString, 'r', encoding="utf8", errors="ignore") as fThomasfile, \ open('../collationChunks/1823_fullFlat_' + matchString, 'r', encoding="utf8", errors="ignore") as f1823file, \ open('../collationChunks/1831_fullFlat_' + matchString, 'r', encoding="utf8", errors="ignore") as f1831file, \ open('../collationChunks/msColl_' + matchString, 'r', encoding="utf8", errors="ignore") as fMSfile, \ open('../testOutputs/collation_1' + matchStr + '.xml', 'w') as outputFile:
Recording errors from @wdjacca 's efforts to run the Python script: With original syntax:
with open(name, 'rb') as f1818file, \ open('../collationChunks/Thomas_fullFlat_' + matchString, 'rb') as fThomasfile, \ open('../collationChunks/1823_fullFlat_' + matchString, 'rb') as f1823file, \ open('../collationChunks/1831_fullFlat_' + matchString, 'rb') as f1831file, \ open('../collationChunks/msColl_' + matchString, 'rb') as fMSfile, \ open('../testOutputs/collation_' + matchStr + '.xml', 'w') as outputFile:
ERROR MESSAGE:
Traceback (most recent call last):
File "E:/Frankenstein-Variorum/fv-collation/collateXPrep/python/allWitnessIM_collation_to_xml_OneCollChunk.py", line 154, in <module>
print(table, file=outputFile)
UnicodeEncodeError: 'cp950' codec can't encode character '\xe6' in position 51435: illegal multibyte sequence
Process finished with exit code 1
We tried changing the rb
to r
and added encoding="utf8", errors="ignore"
to the open()
lines to open the files.
That didn't help, and the output error was very similar, but generated a little more detail in @wdjacca 's Pycharm:
Traceback (most recent call last):
File "E:/Frankenstein-Variorum/fv-collation/collateXPrep/python/allWitnessIM_collation_to_xml_OneCollChunk.py", line 132, in <module>
f1818_tokens = regexLeadingBlankLine.sub('', regexBlankLine.sub('\n', extract(f1818file))).split('\n')
File "E:/Frankenstein-Variorum/fv-collation/collateXPrep/python/allWitnessIM_collation_to_xml_OneCollChunk.py", line 66, in extract
for event, node in doc:
File "C:\Program Files\Python38\lib\xml\dom\pulldom.py", line 233, in __next__
rc = self.getEvent()
File "C:\Program Files\Python38\lib\xml\dom\pulldom.py", line 262, in getEvent
buf = self.stream.read(self.bufsize)
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 6478: illegal multibyte sequence
Process finished with exit code 1
Resolved with checking the regional language setting specifically on Windows machines (https://stackoverflow.com/questions/56419639/what-does-beta-use-unicode-utf-8-for-worldwide-language-support-actually-do) Major checks:
Here let's draft stuff to help explain the Python collation process. Storyboard the Python script to feature examples from the code with descriptions of what's happening.