brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
917 stars 96 forks source link

better manifest generation in LaTeX2ODT #385

Closed kohlhase closed 10 years ago

kohlhase commented 11 years ago

[Originally Ticket 1709]

The current way of generating the manifest.xml from the \includegraphics via an XSLT stylsheet is not good. It only works if the pictures are siblings of the tex file.

What we really want it to have something that

Maybe one of the PERL hackers can help.

kohlhase commented 11 years ago

(In #375) The handling of figures is particularly pressing, see #285

kohlhase commented 11 years ago

It would really be good to have this done soon, since it is a pain to adjust manually in the paper we have to submit.

brucemiller commented 11 years ago

I think that the strategy that we'd eventually adopt wouldn't necessarily be to track all the files touched while processing (eg \includegraphics, but something like the following:

Initially, we'd create a temp directory, and then use that as the destination directory (and site directory). Then, we'd process the document as usual. At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest. That doesn't tell you anything about what each file is or why it's included (if the manifest needs that info?), but you'd get all the files.

And then, you'd simply zip up the temp directory.

That shouldn't be too hard to experiment with in the Makefile temporarily. A simple script should transform ls into a manifest?

dginev commented 11 years ago

Replying to comment 3 @brucemiller:

At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest.

Probably each format that needs a manifest should provide its own whitelist of extensions that would be copied over to the zip. Or if it ends up easier - a blacklist.

E.g. a generic request for a ZIP would want any .sty and .tex files returned back (erm, or would it? That's kind of open) while ePub and ODT archives should be minimal and contain only the meat of the final representation. Similarly for .dvi files that got converted to .png, you wouldn't want to keep the originals around for ePub or ODT.

Now I am starting to think whether my web service conversion shouldn't really be creating 2 temporary directories - one for sources and one for results. Maybe that's a saner approach.

brucemiller commented 11 years ago

Replying to comment 4 @dginev:

Replying to comment 3 @brucemiller:

At the end, everything in the temp directory (except for LaTeXML.cache(s)) would presumably be neede so we'd list them in the manifest.

Probably each format that needs a manifest should provide its own whitelist of extensions that would be copied over to the zip. Or if it ends up easier - a blacklist.

I'm thinking that the format/postprocessors only put into the destination what is needed; they only copy & massage an image if that is what is requested by the options and parameters.

In fact, rather than exclude a LaTeXML.cache, it probably should be arranged to not create the file in that location in the first place. (it's only used for efficiency when re-processing a file after small changes).

E.g. a generic request for a ZIP would want any .sty and .tex files returned back (erm, or would it? That's kind of open) while ePub and ODT archives should be minimal and contain only the meat of the final representation. Similarly for .dvi files that got converted to .png, you wouldn't want to keep the originals around for ePub or ODT.

Now I am starting to think whether my web service conversion shouldn't really be creating 2 temporary directories - one for sources and one for results. Maybe that's a saner approach.

That sounds right to me; I've always found it best not to mingle the two.

kohlhase commented 11 years ago

I agree with where this discussion is going. Especially with generating a tmp results directory that can just be zipped up. But note that the directory for ODT has a substructure. Here is an example generated by libreoffice,

  -rw-rw-rw-        39  28-Mar-2013  05:10:24  mimetype
  -rw-rw-rw-      1086  28-Mar-2013  05:10:24  meta.xml
  -rw-rw-rw-      1410  28-Mar-2013  05:10:24  ObjectReplacements/Object 1
  -rw-rw-rw-      1410  28-Mar-2013  05:10:24  ObjectReplacements/Object 2
  -rw-rw-rw-      9865  28-Mar-2013  05:10:24  settings.xml
  -rw-rw-rw-      4449  28-Mar-2013  05:10:24  content.xml
#  -rw-rw-rw-      6663  28-Mar-2013  05:10:24  Object 1/settings.xml
  -rw-rw-rw-       391  28-Mar-2013  05:10:24  Object 1/content.xml
#  -rw-rw-rw-       892  28-Mar-2013  05:10:24  Thumbnails/thumbnail.png
#  -rw-rw-rw-      6663  28-Mar-2013  05:10:24  Object 2/settings.xml
  -rw-rw-rw-       390  28-Mar-2013  05:10:24  Object 2/content.xml
#  -rw-rw-rw-       899  28-Mar-2013  05:10:24  manifest.rdf
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/popupmenu/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/images/Bitmaps/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/toolpanel/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/statusbar/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/toolbar/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/progressbar/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/menubar/
#  drwxrwxrwx         0  28-Mar-2013  05:10:24  Configurations2/floater/
#  -rw-rw-rw-         0  28-Mar-2013  05:10:24  Configurations2/accelerator/current.xml
  -rw-rw-rw-     11483  28-Mar-2013  05:10:24  styles.xml
  -rw-rw-rw-      2112  28-Mar-2013  05:10:24  META-INF/manifest.xml

The "commented-out" lines seem to be optional, I am not sure about the Objectreplacements directory, but assume they are (they seem to be GDI Metafiles; which seems to be a windows thing).

The corresponding manifest.xml is

<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0" manifest:version="1.2">
 <manifest:file-entry manifest:full-path="/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.text"/>
 <manifest:file-entry manifest:full-path="meta.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="ObjectReplacements/Object 1" manifest:media-type="application/x-openoffice-gdimetafile;windows_formatname=&quot;GDIMetaFile&quot;"/>
 <manifest:file-entry manifest:full-path="ObjectReplacements/Object 2" manifest:media-type="application/x-openoffice-gdimetafile;windows_formatname=&quot;GDIMetaFile&quot;"/>
 <manifest:file-entry manifest:full-path="settings.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="Object 1/settings.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="Object 1/content.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="Object 1/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
 <manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
 <manifest:file-entry manifest:full-path="Object 2/settings.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="Object 2/content.xml" manifest:media-type="text/xml"/>
 <manifest:file-entry manifest:full-path="Object 2/" manifest:version="1.2" manifest:media-type="application/vnd.oasis.opendocument.formula"/>
 <manifest:file-entry manifest:full-path="manifest.rdf" manifest:media-type="application/rdf+xml"/>
 <manifest:file-entry manifest:full-path="Configurations2/accelerator/current.xml" manifest:media-type=""/>
 <manifest:file-entry manifest:full-path="Configurations2/" manifest:media-type="application/vnd.sun.xml.ui.configuration"/>
 <manifest:file-entry manifest:full-path="styles.xml" manifest:media-type="text/xml"/>
</manifest:manifest>

So the only difficulty is to get the media types right.

kohlhase commented 10 years ago

this has been done together with Deyan, closing