Closed deeplow closed 1 year ago
A contributor of dangerzone broke down some of this nuance here:
MIME types
HWP and HWPX use custom MIME types that are not recognized by IANA. And one format has multiple MIME types, so they all need to be added. Some recommend application/vnd.hancom.*. But wildcard may not be supported on this code base and it may lead to security problems.
- hwp
application/x-hwp
,application/haansofthwp
,application/vnd.hancom.hwp
- hwpx
application/haansofthwpx
,application/vnd.hancom.hwpx
Reference (in Korean)
@deeplow thank you for advice.
you're right. H2Orestart has detected with file extension wheter hwp or hwpx. it helps open hancom file quickly, because it doesn't need to parse file structure. I would agree this is not a safe way.
regarding mime-type, Libreoffice never ask plugin with mime-type if plugin can support the file to open. Libreoffice asks extension only with file path if file can be supported on plugin.
so I will choose a compromised method.
it is to detect file type as parsing actual file structure up to hwpx header or hwp header one by one, even it need more seconds until file open. plugin will not look file extension.
this modification will be included in v0.5.6.
thank you.
It's just idea but what about using other libraries like Apache Tika ?
Thanks a lot @ebandal this worked!
As it stands, the plugin decides on the file based the extension, which I believe happens in the following lines
https://github.com/ebandal/H2Orestart/blob/df06ca5a9f931ae395bd379a8bae4e3b7d32e84f/source/soffice/WriterContext.java#L86-L92
However, on Linux systems, generally the file extension doesn't matter much and ideally the file can still be detected based on the mime type.
Would it be possible to use mime types instead of the extensions?