Closed Oliver-Hanikel closed 2 years ago
@Oliver-Hanikel Yes I know. The image size can be reduced drastically, but I haven't looked into that yet... For now I used pandoc because all my notes had math and latex embedded into them. Pandoc was the one I used at the time that fully supported al my needs. If there are other option that support latex etc. I'll be happy to hear that!
Is Latex support even needed in the markdown converter? Isn't mathjax used in the frontend for the conversion? If Latex is not needed we could switch to pymd4c. md4c also is according to themselves the fastest markdown converter there is.
After switching to python:alpine
as base image:
Architecture | Size¹ |
---|---|
amd64 | 338MB |
aarch64 | 395MB |
But this does not work for armhf, because the pandoc
package does not exist for armhf.
Is Latex support even needed in the markdown converter? Isn't mathjax used in the frontend for the conversion? If Latex is not needed we could switch to pymd4c. md4c also is according to themselves the fastest markdown converter there is.
I wrote the math of all my notes with latex syntax and I use references (https://github.com/Linbreux/wikmd/blob/main/wiki/How%20to%20use%20the%20wiki.md) also image sizing is easy. When we have developed a macro system I wouldn't mind changing from pandoc to anther one. But for know I personally use to many functionality from pandoc. But I'll take a look at pymd4c, thanks for the suggestion!
That's an whole improvement!
But this does not work for armhf, because the
pandoc
package does not exist for armhf.
Hmm could we create one ourself from source?
I managed to remove BeatifulSoup, Markdown and Pandoc from the dependencies and added PyMD4C as a replacement. They didn't promise too much, it really is blazingly fast. The Example documents took 400-800ms to render with pandoc on my laptop. MD4C manages to render them in 2-8ms. But there are still a few things that aren't working:
Most of these are probably fixable with the DOM Parser and a bit of work. Currently I am using the, basically completly in C implemented, HTMLRenderer so switching to the DOM Parser will probably make the rendering a bit slower. If someone wants to test it here is the branch.
Architecture | Size |
---|---|
amd64 | 124MB |
armhf | 105MB |
aarch64 | 123MB |
@Oliver-Hanikel Interesting! Like I said, this would be an interesting implementation. @kura Implemented a cache system which should speed up loading times drastically. When it's possible to use all the features in features.md
with another html renderer, we could switch. I don't think it would be a smart move to remove features yet.
Pandoc is not the best option, but it supports ton's of features https://pandoc.org/MANUAL.html#pandocs-markdown
I personally don't see a problem with an image that is 400MB+ in size given my the documents and uploads in my wiki are already 200MB+ in size.
As for replacing pandoc, I think Markdown would be a good alternative, since it has support for the ToC feature in development and it's already in use in the Whoosh search feature. Any removal of BeautifulSoup would mean needing a tool that is capable of converting Markdown to plaintext directly to replace the Markdown -> HTML -> Plaintext step done in the search module to make the content indexable in a way that is searchable. I should also add that the Markdown library makes it very easy to write your own extensions which would be a simple way to implement any macros you want.
Markdown does have an extension that could be used to handle the LaTeX which may mean everything in features.md
is supportable with a library like Markdown rather than pandoc.
So, I just checked and even the smallest LaTeX library that can be used by the Markdown-LaTeX extension is 160MB alone so it's not that much of an image size reduction.
When we have developed a macro system I wouldn't mind changing from pandoc to anther one. But for know I personally use to many functionality from pandoc.
Yeah my branch definitely isn't ready for usage, there are too many features missing. It is more of an experiment.
I personally don't see a problem with an image that is 400MB+ in size given my the documents and uploads in my wiki are already 200MB+ in size.
Well it is much faster to download new images, also the image generally builds faster now. I am running wikmd on a Raspberry Pi 3B with pretty small markdown files so I prefer a leaner docker image. A smaller image wears out the sd card only as much as needed, so it has a longer lifetime.
Any removal of BeautifulSoup would mean needing a tool that is capable of converting Markdown to plaintext directly to replace the Markdown -> HTML -> Plaintext step done in the search module to make the content indexable in a way that is searchable.
You can either do this with pyMD4C as shown here or with the HTMLParser from the standard library, which is also the parser BeatifulSoup uses in the current version of wikmd. Here is a working version of that.
I am now looking into using TinyTex in the docker image to make it smaller while still using pandoc.
Just FYI I made a very small set of changes that replaces 90% of the pandoc functionality using the python-markdown library. Only thing that isn't properly working is the latex functionality. I tried using a ~400MB install of texlive to handle the latex stuff but it isn't properly detecting a handling things like |- **wiki** $\leftarrow$ This folder
As a note, it also hooks in to some of the markdown extensions to add in things like Table of Content support using the built-in toc
extension.
Maybe something like TinyTex as mentioned would be a better solution and would maybe fix some of the LaTeX issues? I may give it a try later.
I had not thought about using the built-in HTMLParser for search... I'll give that a whirl now.
The image is way too big for a wiki that tries to be lean.
¹Size of #67 images
I'll try to improve this by using alpine as the base image. But there would also be other ways like reducing the amount of dependencies or switching out dependencies. For example the installed size of
pandoc
is 100MB on arm64 butmarkdown
would only take up 57KB.