Closed kohlhase closed 10 years ago
(In #375) The handling of figures is particularly pressing, see #285
It would really be good to have this done soon, since it is a pain to adjust manually in the paper we have to submit.
I think that the strategy that we'd eventually adopt
wouldn't necessarily be to track all the files touched while
processing (eg \includegraphics
, but something like the
following:
Initially, we'd create a temp directory, and then use that as the destination directory (and site directory). Then, we'd process the document as usual. At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest. That doesn't tell you anything about what each file is or why it's included (if the manifest needs that info?), but you'd get all the files.
And then, you'd simply zip up the temp directory.
That shouldn't be too hard to experiment with in the Makefile
temporarily. A simple script should transform ls
into a manifest?
Replying to comment 3 @brucemiller:
At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest.
Probably each format that needs a manifest should provide its own whitelist of extensions that would be copied over to the zip. Or if it ends up easier - a blacklist.
E.g. a generic request for a ZIP would want any .sty and .tex files returned back (erm, or would it? That's kind of open) while ePub and ODT archives should be minimal and contain only the meat of the final representation. Similarly for .dvi files that got converted to .png, you wouldn't want to keep the originals around for ePub or ODT.
Now I am starting to think whether my web service conversion shouldn't really be creating 2 temporary directories - one for sources and one for results. Maybe that's a saner approach.
Replying to comment 4 @dginev:
Replying to comment 3 @brucemiller:
At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest.
Probably each format that needs a manifest should provide its own whitelist of extensions that would be copied over to the zip. Or if it ends up easier - a blacklist.
I'm thinking that the format/postprocessors only put into the destination what is needed; they only copy & massage an image if that is what is requested by the options and parameters.
In fact, rather than exclude a LaTeXML.cache, it probably should be arranged to not create the file in that location in the first place. (it's only used for efficiency when re-processing a file after small changes).
E.g. a generic request for a ZIP would want any .sty and .tex files returned back (erm, or would it? That's kind of open) while ePub and ODT archives should be minimal and contain only the meat of the final representation. Similarly for .dvi files that got converted to .png, you wouldn't want to keep the originals around for ePub or ODT.
Now I am starting to think whether my web service conversion shouldn't really be creating 2 temporary directories - one for sources and one for results. Maybe that's a saner approach.
That sounds right to me; I've always found it best not to mingle the two.
I agree with where this discussion is going. Especially with generating a tmp results directory that can just be zipped up. But note that the directory for ODT has a substructure. Here is an example generated by libreoffice,
-rw-rw-rw- 39 28-Mar-2013 05:10:24 mimetype
-rw-rw-rw- 1086 28-Mar-2013 05:10:24 meta.xml
-rw-rw-rw- 1410 28-Mar-2013 05:10:24 ObjectReplacements/Object 1
-rw-rw-rw- 1410 28-Mar-2013 05:10:24 ObjectReplacements/Object 2
-rw-rw-rw- 9865 28-Mar-2013 05:10:24 settings.xml
-rw-rw-rw- 4449 28-Mar-2013 05:10:24 content.xml
# -rw-rw-rw- 6663 28-Mar-2013 05:10:24 Object 1/settings.xml
-rw-rw-rw- 391 28-Mar-2013 05:10:24 Object 1/content.xml
# -rw-rw-rw- 892 28-Mar-2013 05:10:24 Thumbnails/thumbnail.png
# -rw-rw-rw- 6663 28-Mar-2013 05:10:24 Object 2/settings.xml
-rw-rw-rw- 390 28-Mar-2013 05:10:24 Object 2/content.xml
# -rw-rw-rw- 899 28-Mar-2013 05:10:24 manifest.rdf
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/popupmenu/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/images/Bitmaps/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/toolpanel/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/statusbar/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/toolbar/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/progressbar/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/menubar/
# drwxrwxrwx 0 28-Mar-2013 05:10:24 Configurations2/floater/
# -rw-rw-rw- 0 28-Mar-2013 05:10:24 Configurations2/accelerator/current.xml
-rw-rw-rw- 11483 28-Mar-2013 05:10:24 styles.xml
-rw-rw-rw- 2112 28-Mar-2013 05:10:24 META-INF/manifest.xml
The "commented-out" lines seem to be optional, I am not sure about the Objectreplacements
directory, but assume they are (they seem to be GDI Metafiles; which seems to be a windows thing).
The corresponding manifest.xml
is
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" manifest:version="1.2">
<manifest:file-entry manifest:full-path="/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.text"/>
<manifest:file-entry manifest:full-path="meta.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="ObjectReplacements/Object 1" manifest:media-type="application/x-openoffice-gdimetafile;windows_formatname="GDIMetaFile""/>
<manifest:file-entry manifest:full-path="ObjectReplacements/Object 2" manifest:media-type="application/x-openoffice-gdimetafile;windows_formatname="GDIMetaFile""/>
<manifest:file-entry manifest:full-path="settings.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Object 1/settings.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Object 1/content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Object 1/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
<manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
<manifest:file-entry manifest:full-path="Object 2/settings.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Object 2/content.xml" manifest:media-type="text/xml"/>
<manifest:file-entry manifest:full-path="Object 2/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
<manifest:file-entry manifest:full-path="manifest.rdf" manifest:media-type="application/rdf+xml"/>
<manifest:file-entry manifest:full-path="Configurations2/accelerator/current.xml" manifest:media-type=""/>
<manifest:file-entry manifest:full-path="Configurations2/" manifest:media-type="application/vnd.sun.xml.ui.configuration"/>
<manifest:file-entry manifest:full-path="styles.xml" manifest:media-type="text/xml"/>
</manifest:manifest>
So the only difficulty is to get the media types right.
this has been done together with Deyan, closing
[Originally Ticket 1709]
The current way of generating the
manifest.xml
from the\includegraphics
via an XSLT stylsheet is not good. It only works if the pictures are siblings of the tex file.What we really want it to have something that
content.xml
, andmanifest.xml
at the same time. I am not quite sure what the best, but Makefile + XSLT is not the right tool.Maybe one of the PERL hackers can help.