Closed naglis closed 2 months ago
(such files are not displayed - possible Qt issue?)
Ok, that's unexpected.
Since the filenames are printed at the end after all conversions, IIUC it does not impact the conversions, but the exception prevents the filename and and any subsequent filenames from being printed, and in case all documents were converted successfully it also impacts the exit code.
Nice catch! Thanks a lot for reporting this. That was actually a security concern of ours, and we had added a special function to handle any character in a filename that is not printable (see https://github.com/freedomofpress/dangerzone/pull/491/commits/cfa0c01d8fe87d2f3ff187a93a5e984d64352067). So, it seems that we didn't catch everything.
and we had added a special function to handle any character in a filename that is not printable (see cfa0c01). So, it seems that we didn't catch everything.
Indeed, I saw replace_control_chars
and was wondering why it was not used in all cases where the filenames are printed, just in announce_id
and was meaning to ask it in the PR, but ran into a few test failures. IIUC the issue here is writing surrogate escapes to a stream that does not support it, the circumstances are quite nicely summed up in click.format_filename
documentation.
If a filename contains invalid UTF-8 sequences,
dangerzone-cli
fails to print the filename of the successfully converted/failed document due to anUnicodeEncodeError
.Steps to reproduce (use bash for ANSI-C quoting):
Printing filenames of successfully converted documents:
Output
``` ╭──────────────────────────╮ │ ▄██▄ │ │ ██████ │ │ ███▀▀▀██ │ │ ███ ████ │ │ ███ ██████ │ │ ███ ▀▀▀▀████ │ │ ███████ ▄██████ │ │ ███████ ▄█████████ │ │ ████████████████████ │ │ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ │ │ │ │ Dangerzone v0.6.0 │ │ https://dangerzone.rocks │ ╰──────────────────────────╯ Assigning ID 'D-TevQ' to doc '/tmp/sample_.pdf' Converting document to safe PDF > /usr/bin/podman run --network none -u dangerzone --log-driver none --security-opt no-new-privileges --userns keep-id --cap-drop all --rm -i dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels [doc D-TevQ] 0% Converting page 1/4 to pixels [doc D-TevQ] 12% Converting page 2/4 to pixels [doc D-TevQ] 24% Converting page 3/4 to pixels [doc D-TevQ] 36% Converting page 4/4 to pixels [doc D-TevQ] 49% Converted document to pixels > /usr/bin/podman run --network none -u dangerzone --log-driver none --security-opt no-new-privileges --userns keep-id --cap-drop all --rm -i -v /tmp/tmphot68xih:/safezone:Z -e OCR=0 -e OCR_LANGUAGE=None dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.pixels_to_pdf [doc D-TevQ] 50% Converting page 1/4 from pixels to PDF [doc D-TevQ] 61% Converting page 2/4 from pixels to PDF [doc D-TevQ] 72% Converting page 3/4 from pixels to PDF [doc D-TevQ] 83% Converting page 4/4 from pixels to PDF [doc D-TevQ] 100% Safe PDF created Safe PDF(s) created successfully Traceback (most recent call last): File "/usr/bin/dangerzone-cli", line 33, inPrinting filenames of documents that failed to convert:
Output
``` ╭──────────────────────────╮ │ ▄██▄ │ │ ██████ │ │ ███▀▀▀██ │ │ ███ ████ │ │ ███ ██████ │ │ ███ ▀▀▀▀████ │ │ ███████ ▄██████ │ │ ███████ ▄█████████ │ │ ████████████████████ │ │ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ │ │ │ │ Dangerzone v0.6.0 │ │ https://dangerzone.rocks │ ╰──────────────────────────╯ Assigning ID 'UWTbXN' to doc '/tmp/sample-bad_.pdf' Converting document to safe PDF > /usr/bin/podman run --network none -u dangerzone --log-driver none --security-opt no-new-privileges --userns keep-id --cap-drop all --rm -i dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels ERROR [doc UWTbXN] 0% The document format is not supported Failed to convert document(s) Traceback (most recent call last): File "/usr/bin/dangerzone-cli", line 33, inTested on v0.6.0 on Arch Linux. I was unable to reproduce it via GUI as I was not able to select files with invalid UTF-8 sequences in the filename in the file dialog (such files are not displayed - possible Qt issue?). Update: GUI is not affected if passing the filename when starting
dangerzone
.Since the filenames are printed at the end after all conversions, IIUC it does not impact the conversions, but the exception prevents the filename and and any subsequent filenames from being printed, and in case all documents were converted successfully it also impacts the exit code.