Open Scripter17 opened 10 months ago
Thanks for the suggestion! Yes, we do intend to add more formats to Dangerzone. We are currently in the process of replacing our core conversion component with one that does support many other file formats, so that will be trivial do implement once we've done that.
We'll essentially be able to add the following file formats once this is complete:
.cbz we probably won't be able to include at the moment since it's a zip archive by it's mime type and on the container we currently don't have access to the original file extension. According to wikipedia, this may also be application/vnd.comicbook+zip
sometimes. So let's add that for now.
I couldn't find a mime type for the PAM image format. I'll drop it here. The creators of a library that parses it explain (or add to) this confusion:
The Confusing Universe of Netpbm Formats
It is easy to get confused about the relationship between the PAM format and PBM, PGM, PPM, and PNM. Here is a little enlightenment:
"PNM" is not really a format. It is a shorthand for the PBM, PGM, and PPM formats collectively. It is also the name of a group of library functions that can each handle all three of those formats.
"PAM" is in fact a fourth format. But it is so general that you can represent the same information in a PAM image as you can in a PBM, PGM, or PPM image. And in fact a program that is designed to read PBM, PGM, or PPM and does so with a recent version of the Netpbm library will read an equivalent PAM image just fine and the program will never know the difference.
To confuse things more, there is a collection of library routines called the "pam" functions that read and write the PAM format, but also read and write the PBM, PGM, and PPM formats. They do this because the latter formats are much older and more popular, so even a new program must work with them. Having the library handle all the formats makes it convenient to write programs that use the newer PAM format as well.
Also, on Mupdf some references to office formats (including .hwp
). I wonder what those are for since mupdf does not support these formats. One debug comment seems to hint at the fact that they convert these files to html but html is also not supported :thinking:.
No references point to this actually being supported. So I'll stop digging here. But I found it curious.
.cbz we probably won't be able to include at the moment since it's a zip archive by it's mime type and on the container we currently don't have access to the original file extension. According to wikipedia, this may also be application/vnd.comicbook+zip sometimes. So let's add that for now.
I am running into similar issues with the .xps
file format. Guessing it from file contents alone reveal application/zip
similar to what we had experienced with LibreOffice files. And this isn't because of some odd tool doing a bad job. I just converted our sample-docx.docx
to .xps
with Microsoft Office and it showed application/zip
as the mime type.
This is proving to be a bit more challenging than I originally anticipated because we have different PyMuPDF versions running. Particularly in Qubes OS.
File Format | MuPDF Min Supported Version | Notes |
---|---|---|
.psd |
1.23.0 (2023-08-22) |
|
.txt |
1.23.6 |
but in practice only works in in 1.23.7 |
.jxr |
1.10-rc1 | Server fails due to missing codec: code=2: JPEG-XR codec is not available (on fedora installing jxrlib , jxrlib-devel or openjpeg-libs didn't help` |
.pgm |
? (couldn't find) | |
.mobi |
MuPDF 1.21.0-rc1 | |
.fb2 |
MuPDF 1.10-rc1 | In practice this file is cannot be detected by the mimetype alone |
With .jpx
I wasn't finding any documentation on how to convert to this file type under linux. Supposedly the convert
can generate one such file and the magic number matches that of on the respective wikipedia article.
However, PyMuPDF still rejected this file. So I won't be adding it for now.
Renamed the issue to the remaining file formats https://github.com/freedomofpress/dangerzone/pull/697
I was a bit surprised to see Dangerzone doesn't support epub files and even more surprised to see there's not a single issue/PR about it. Unless Dangerzone/Freedom of the Press has some kind of anti-piracy policy (similar to youtube-dl) then I see no real reason to not have this
It may be possible to simply include calibre in the sandbox to support every format it supports that Dangerzone doesn't