ether / etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.
http://docs.etherpad.org/
Apache License 2.0
16.65k stars 2.85k forks source link

Replace AbiWord by a different import/export processor like LibreOffice (see also unoconv wrapper) #254

Closed Wikinaut closed 10 years ago

Wikinaut commented 12 years ago

As AbiWord is very difficult to install with its numerous dependencies, I suggest to replace AbiWord by alternative tools which are easier to install and better maintained.

I contacted the AbiWord developers, but unsuccessfully with the view to fix several multi-platform installation issues (for example, on LINUX SLES), and to fix the "AbiWord crashes on certain input files" problems. They also stated that AbiWord is not the best tool for (our EPLITE) import/export tasks.

yadutaf commented 12 years ago

Do you have any suggestions for a replacement ? getting ".doc" exporter to work would require a huge time ! Btw, there is the webodf project (http://gitorious.org/odfkit/webodf/trees/master/webodf/) that looks very promisingbut it still lacks the "create file" function. It is only able to read and edit ODF files.

Wikinaut commented 12 years ago

I contacted the AbiWord developers, but unsuccessfully with the view to fix several multi-platform installation issues (for example, on LINUX SLES), and to fix the "AbiWord crashes on certain input files" problems. They also stated that AbiWord is not the best tool for (our EPLITE) import/export tasks.

Another problem is that my request to "indicate, how abiword needs to be compiled for a headless server environment including all import/export plugins" http://bugzilla.abisource.com/show_bug.cgi?id=13168 is still unanswered.

Etherpad Lite currently relies upon AbiWord plugin "AbiCommand" as import/export tool, and the GUI features of AbiWord are not needed. An mere installation of AbiWord without the GUI libraries would be helpful and could be a solution of many problems.

markreg commented 12 years ago

I agree with Wikinaut. There needs to be a better/easier installation method of Abiword.

Wikinaut commented 12 years ago

@markeg currently I am investigating with some of the developers what can be changed at AbiWord, in other words, what are the minimum requirements to compile a trunk version of AbiWord with exactly only the stuff we actually need for EPLITE conversion function (import/export functions).

For example, we do not need any GUI functionality of AbiWord.

jhollinger commented 12 years ago

+1. Even though it's "easy" to install with tools like apt-get, pulling that enormous list of dependencies down onto my nice, clean server makes me shudder.

Just a thought, but maybe we don't need to replace Abiword so much as supplement it with additional back-ends? That way you could pick whichever one worked best for your environment in settings.json.

Wikinaut commented 12 years ago

I started investigations how the call to Abiword ( AbiCommand ) can be better replaced by a command line call to LibreOffice.

Please read the following from http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx :

Citation starts:

I found that LibreOffice (OpenOffice's successor) allows command line conversion using the LibreOffice conversion engine (which DID preserve the formatting like I wanted and generally worked great).

I loaded the latest version of Ubuntu (http://www.ubuntu.com/download/ubuntu/download) onto my Virtual Box (https://www.virtualbox.org/wiki/Downloads) on my computer and found that I was able to easily convert files using the commandline like this:

libreoffice --headless -convert-to pdf fileToConvert.docx -outdir output/path/for/pdf

Citation ends. I did not try it yet.

Pita commented 12 years ago

We have to do some performance testing of libreoffice vs abiword. But I don't believe libreoffice can beat abiword

Wikinaut commented 12 years ago

@Pita Yes, of course. I had many problems on different platforms to compile AbiWord. It is also not part of SLES, for example.

Generally spoken, I currently would prefer LibreOffice (or OpenOffice) over AbiWord for many reasons, and also think, it is better maintained and reviewed than AbiWord.

yadutaf commented 12 years ago

Just another question: is abiword still under active development?

-- Jean-Tiare Le 28 janv. 2012 15:31, "Wikinaut" < reply@reply.github.com> a écrit :

@Pita Yes, of course. I had many problems on different platforms to compile AbiWord. It is also not part of SLES, for example.

Generally spoken, I currently prefer LibreOffice (or OpenOffice) over AbiWord for many reasons.


Reply to this email directly or view it on GitHub: https://github.com/Pita/etherpad-lite/issues/254#issuecomment-3700596

Wikinaut commented 12 years ago

@jtlebi It appears to be under active development, I subscribed to their (abiword) mailing lists. See also my question to them for a so-to-call "headless" installation (as required by Etherpad lite for conversion purposes), I did not receive a concrete answer for a "headless" compilation, but something is posted here http://bugzilla.abisource.com/show_bug.cgi?id=13168 .

Wikinaut commented 12 years ago

Implementation idea:

see also http://blog.larsstrand.org/2010/12/convert-word-documents-with-pictures-to.html (and https://github.com/mhagander/word2mediawiki/ ) for a converter from Word ( to MediaWiki syntax ) which also uses "headless" OpenOffice or LibreOffice .

Added for tracking these urls, perhaps we can use some ideas from it.

Wikinaut commented 12 years ago

Tl;dr: see https://wiki.documentfoundation.org/Development/HeadlessBuild

Long answer:

Today I contacted the LibreOffice developers for instructions how to compile a headless version for EPL conversion needs (as replacement for AbiWord) and got the following answer:

Hello for the mere use of converting documents of different formats to PDF or HTML in Etherpad Lite which currently uses AbiWord for this purpose I need some information how to configure

  • i.e. without uneeded "Desktop" modules, for mere commandline use.

For these two parts you can take a look at https://wiki.documentfoundation.org/Development/HeadlessBuild

thanks, riccardo


LibreOffice mailing list LibreOffice@lists.freedesktop.org

Wikinaut commented 12 years ago

I found also this: unoconv: Convert between any document format supported by OpenOffice http://dag.wieers.com/home-made/unoconv/

"unoconv is written in python. It needs a recent OpenOffice with UNO bindings."

Pita commented 12 years ago

I don't see the sense in replacing abiword. Remember that we want to support windows, linux and mac. We can't let the user compile stuff just for document converting. What are the benefits you see in using libreoffice?

Pita commented 12 years ago

The main reason why I chosed Abiword was because its very easy to communicate with it. It has this interactive console that we can use. So we can start abiword once and send it commands over the stdin stream. This enables us to convert documents very fast because we don't have to start a process. Any other solution I found had one of this problems 1) You have to start a process for every convert task. This makes it really slow 2) It's not supported cross platform 3) Does not support Word. @johnyma22 wants to use Word

Wikinaut commented 12 years ago

I don't see the sense in replacing abiword. Remember that we want to support windows, linux and mac. We can't let the user compile stuff just for document converting. What are the benefits you see in using libreoffice?

I don't like AbiWord. There was no support for compiling a headless version without the need to install this useless X11 stuff. I cannot compile it on (certain) SLES versions because I have no access to the dependencies.

OpenOffice and LibreOffice is more widely used, and in my view, more actively supported and code reviewed.

I am not pushing a change over to LO or OO, however, I was dissatisfied by AbiWord. Remember the AbiWord failure when converting "&" in Urls which crashes our EPL code 1.0 (well, & must be coded as &amp; in Urls according to RFC, but this was another issue)

Wikinaut commented 12 years ago

again: I am not pushing a change over to LO or OO. Just wanted to point to possible alternatives for users who cannot install AbiWord.

JohnMcLear commented 11 years ago

@Wikinaut Considering you can now import plain text files without abiword, would you say this is closed or are sou still looking for a solution for handling proprietary formats?

Wikinaut commented 11 years ago

@johnyma22 wrote

Considering you can now import plain text files without abiword, would you say this is closed or are sou still looking for a solution for handling proprietary formats?

  • short answer:

I suggest to let this issue open, but label it as "minor", or "enhancement". If you however want to close it, it's okay for me.

I was not primarily interested in importing, but in exporting pads as PDF, which I use almost daily.

I don't like AbiWord, because there was no support for compiling a headless version without the need to install this useless X11 stuff, and, worse, I cannot compile it on (certain) SLES versions because I have no access to the dependencies. And because OpenOffice and LibreOffice is more widely used, and in my view, much more actively supported and code reviewed, I think we should also think of offering OO or LO "import/export" service.

JohnMcLear commented 11 years ago

Afaik OO and LO have lots more dependencies and requires Java (unless I'm wrong).

I will change the issue status :) At least we took a few steps forward, I guess native PDF export is something that could even be done as a plugin.

dgeo commented 11 years ago

(libre|open)office has a "server mode" that can be shared between many services, and is not hard to use is the code (http post method). Configuration may be a single host:port or unix:/socket from EPL's view, and one of the thousand tutorials to launch ooo|LO in headless mode referenced in the sample config…

A 1s one: soffice "-accept=socket,host=127.0.0.1,port=56789,tcpNoDelay=1;urp;" -headless -nodefault -nofirststartwizard -nolockcheck -nologo -norestore

This way, the machine used for conversion may be another (ie: not a web server, a workstation always up for example...)

Sorry I'm not good as a developper, not have I time to learn nodejs for now…

disy-mk commented 11 years ago

Interesting, will look into hat :)

sent from my mobile

Wikinaut commented 11 years ago

@dgeo

see also

Universal Office Converter (unoconv) is a command line tool to convert any document format that LibreOffice can import to any document format that LibreOffice can export. It makes use of the LibreOffice’s UNO bindings for non-interactive conversion of documents. For practical reasons we mention LibreOffice, but OpenOffice is supported by unoconv as well.

dgeo commented 11 years ago

@Wikinaut thank you, I missed that !

Wikinaut commented 11 years ago

@dgeo

In case you can manage to setup a complete download (git clone) compilation of a headless libreoffice (as said before: from its git repository) and unoconv to replace AbiWord - it would be great if you can document it here!

JohnMcLear commented 10 years ago

I'm going to close this for the following reasons:

Other export methods can now be added as plugins, I wouldn't object to seeing a plugin for open office export support.

mrbabbs commented 8 years ago

Hi everyone, I know the issue is closed, but it could help someone. I developed a basic import plugin "ep_document_import_hook" to import document with different tools than AbiWord. Currently, it only uses LibreOffice to import documents, but who wants can fork. The plugin is also available in /admin/plugins.

https://www.npmjs.com/package/ep_document_import_hook