commonmark / cmark

CommonMark parsing and rendering library and program in C
Other
1.6k stars 535 forks source link

Consider usage of the GLib in libcmark #119

Closed MathieuDuponchelle closed 8 years ago

MathieuDuponchelle commented 8 years ago

This subject has been briefly discussed in #100 , but I figured a separate issue to sum up arguments for and against that, and discuss whether this would be acceptable would be useful.

Argument(s) against using the glib

The main (and only) argument raised against using the glib is that of portability. I would argue its usage would actually help with portability, with respect to things like loading of plugins, or threading.

Note that I will open another issue at some point regarding multithreading of the inline parsing phase, as I think this phase is very amenable to parallelization, as long as the separation of inline and block parsing is consistently enforced.

I'm working for an Open Source software consultancy company, collabora, where we routinely deploy glib-based solutions on a wide range of architectures and operating systems, including Windows, and I've never seen any issues with glib's portability.

I'm writing this from my seldom-used Windows partition, where I've just successfully compiled a version of cmark built against the glib, thanks to the MSYS2 project this has been a completely painless experience. Note that it is also trivial to provide installers using that solution.

Arguments for using the glib

Features

See https://developer.gnome.org/glib/2.48/ for the full list of features, here are a few I think are relevant for cmark, in that they could make its codebase way leaner, and help implement features in a portable manner.

@nwellnhof , I know you're concerned with this, but please come to this with an open mind, consider all the things the glib would bring to the table, and evaluate whether this port would really prevent you from using cmark at all, or simply mean spending ten minutes figuring out how to update your bundling of cmark, which could profit to other people using that practice.

kainjow commented 8 years ago

glib is LGPL which is not compatible with BSD2.

MathieuDuponchelle commented 8 years ago

@kainjow , I don't think this is true, do you have anything to substantiate your claim ? According to this and that your statement is incorrect.

MathieuDuponchelle commented 8 years ago

http://www.gnu.org/licenses/license-list.en.html#FreeBSD should further put this argument to rest.

kainjow commented 8 years ago

I think you're referring to the opposite direction. You can use BSD licenses with GPL/LGPL projects just fine, but you can't the other way.

http://choosealicense.com/licenses/

I can include BSD code in my proprietary project without releasing my proprietary source code, but LGPL requires my source code to be GPL as well, or I can link to the library, but that requires distributing a shared library with my project.

MathieuDuponchelle commented 8 years ago

This is wrong. The only case where you would need to relicense code linking with a LGPL library is if you link to that library statically, and do not provide the means for linking with another version dynamically.

Please read this stackoverflow thread for a good summary on this.

kainjow commented 8 years ago

I was saying the same thing you said, just maybe not as clear.

If I want to static link cmark (as BSD2) with my proprietary app, that's perfectly fine. But once it depends on glib, my choices become: change my proprietary app to GPL and release the source code (not going to happen), or bundle the glib shared library (and its dependencies) and cmark shared library.

Take a look at Swift which uses cmark. I guarantee you Apple won't update to a glib-based cmark because of LGPL. That is why they started LLVM in the first place.

MathieuDuponchelle commented 8 years ago

Once again, wrong, why would Apple distribute this in the first place : http://www.opensource.apple.com/source/X11misc/X11misc-20/pkg-config/pkg-config-0.25/glib-1.2.10/glib.h ?

MathieuDuponchelle commented 8 years ago

I'm not sure whether you even read what I said earlier to be honest, first the LGPL doesn't force you to distribute the shared library, second if you static link to it, you only need to provide the means to relink to another version, I think you're confusing the GPL and LGPL licenses here, gcc is GPLv3, and that's the reason why Apple started clang in the first place.

nwellnhof commented 8 years ago

@MathieuDuponchelle What the various sources you cited mean with "compatibility" is compatibility with the GPL. This means that you can incorporate BSD-licensed code in a (L)GPL-licensed project, and relicense it under the (L)GPL.

The reverse is obviously not true. In this sense, @kainjow's first statement is correct. As you noted, it's possible even for proprietary code to link to LGPL libraries. But this imposes some restrictions.

My main point against a GLib dependency is that the current version of cmark works perfectly fine as is. Why add additional dependencies and restrictions if there's no real need for that? Why make things harder for all of cmark's current users just to please a single developer?

MathieuDuponchelle commented 8 years ago

The reverse is obviously not true. In this sense, @kainjow's first statement is correct. As you noted, it's possible even for proprietary code to link to LGPL libraries. But this imposes some restrictions.

This is vague, and sounds a lot like FUD. I don't see how the fact that the GLib is LGPL could even be an argument here.

My main point against a GLib dependency is that the current version of cmark works perfectly fine as is.

Nothing ever works "perfectly fine", it does work under the current set of expectations you have for it, however if we dug a little I promise we would find bugs, including in parts that I propose replacing with the GLib.

Why add additional dependencies and restrictions if there's no real need for that? Why make things harder for all of cmark's current users just to please a single developer?

You're overreacting here, and did not seem to read my last paragraph in my original post :/

In my opinion, the main interest of Open Source is about sharing common building bricks together, and reusable components that everyone can improve, making for a better final solution, it's not to make life easier for third-party vendors, otherwise why not just license cmark under the WTF license and be done with it?

"Make things harder" for all of cmark's users

Seriously now, would cmark ever be the first project you'd depend on to have dependencies if it built upon the GLib instead of reimplementing everything? Really?

MathieuDuponchelle commented 8 years ago

And yes, I get it, you don't like the idea because of this : https://github.com/nwellnhof/lucy-clownfish/tree/master/compiler/modules/cmark, but please be honest and admit that's not exactly the cleanest setup to begin with ;)

nwellnhof commented 8 years ago

And yes, I get it, you don't like the idea because of this : https://github.com/nwellnhof/lucy-clownfish/tree/master/compiler/modules/cmark, but please be honest and admit that's not exactly the cleanest setup to begin with ;)

I don't know what you mean by "clean", but bundling cmark with Apache Clownfish is certainly the most convenient solution for our users. This is one of the main reasons why I'm against adding dependencies like GLib, and I make no secret about it. Just being curious, what's your interest in cmark?

But there are couple of other reasons as well. I wouldn't consider cmark "moderately complex". It's a small library with just about 10 kloc (counting empty lines and everything). Adding a dependency like GLib simply seems overkill. Do we really need a huge framework just to implement a linked list, parse a few command line arguments, or open and read a file? I think this is lazy and emphasizes the needs of developers over the needs of users. Having a small binary with zero dependencies is an important feature for many users, especially on non-Linux platforms. Not having to deal with copyleft licenses is important for some users as well.

Regarding portability: I just tried to build GLib on the Windows command line with MSVC and nmake. It doesn't work and it seems that this hasn't worked for about five years. Have a look at this bug report from 2011. It also looks like the GLib developer who added Win32 support isn't active anymore and the other developers don't care much about such setups. There also seem to be similar problems with some of the GNU libraries GLib depends on. So I really doubt your claim that GLib "would actually help with portability". I know that Windows binaries are available, but there are good reasons to insist on compiling everything from source.

Here's another question. If you prefer to work with an object system and a good standard library, what do you think about reimplementing cmark internals in C++? C++ offers most of the things you mentioned, has great platform support and wouldn't require any dependencies. What I'm trying to say: The suggestion to reimplement cmark in C++ is very similar to your suggestion to use and embrace GLib. But it should be obvious that most cmark developers will be opposed to the idea. Maybe this helps to understand why it's unlikely that you will gain much support for your proposal.

I hope I didn't discourage you from your work on extension support. I think this is an important feature and my offer to help with any cross-platform issues still stands.

MathieuDuponchelle commented 8 years ago

I don't know what you mean by "clean", but bundling cmark with Apache Clownfish is certainly the most convenient solution for our users. This is one of the main reasons why I'm against adding dependencies like GLib, and I make no secret about it.

What I mean by clean is that usually, I don't consider copy pasting as a sustainable development model, and I don't think your upstreams should be limited by your development choices.

Just being curious, what's your interest in cmark?

I'm building a "language-agnostic API documentation micro-framework" (I know that sounds a bit pompous but it describes the goal pretty adequately :) . It uses commonmark as its standard language, and I need a bunch of extensions on top of it, I decided to go for the solution that seemed like the responsible and technically correct one to me, which was to work with the upstream reference implementation to provide API for these extensions at parsing-time. That implies long discussions and bitter arguments, but I believe in the end it will be for the best both for my project and for cmark.

Regarding portability: I just tried to build GLib on the Windows command line with MSVC and nmake.

Why would you even want to compile the glib yourself? Even if you really need to use a compiler that's only been compliant to C99 in 2013, there's no reason not to compile it yourself, one of our customers obtained its builds from http://gstreamer.zeranoe.com/builds/win64/2016.03.27/ , this includes gstreamer so you may want to trim some of it if you chose that as your source, but I know that it provides "property sheets" which make it easy to use in Visual Studio.

Additionally, for windows developers that are not forced to deal with Visual Studio, I linked the MSYS2 project to you earlier, which we could (should) use to distribute installers for cmark, for people that do not insist on building everything from scratch themselves.

I know that Windows binaries are available, but there are good reasons to insist on compiling everything from source.

What are they?

Here's another question. If you prefer to work with an object system and a good standard library, what do you think about reimplementing cmark internals in C++? C++ offers most of the things you mentioned, has great platform support and wouldn't require any dependencies. What I'm trying to say: The suggestion to reimplement cmark in C++ is very similar to your suggestion to use and embrace GLib. But it should be obvious that most cmark developers will be opposed to the idea. Maybe this helps to understand why it's unlikely that you will gain much support for your proposal.

I would disagree with that choice, mostly because of C++ non-standard name mangling.

I hope I didn't discourage you from your work on extension support.

You don't discourage me, I'm a reasonable and sometimes friendly developer who tries not to hold technical grudges ;) I haven't made my work on extensions depend on the GLib because the initial reception to my suggestion was unfavourable. I am completely convinced the choice of using it should be determined by weighing the interest of preserving downstream's bundling practices against its technical virtues, and that's the reason why I opened this issue, which I see as a valuable long-term discussion for cmark.

Not using the GLib means that I will most probably not work on parallelizing inline parsing, as I'd be way more comfortable using a tested and portable thread and thread pool API, rather than trying to implement this on my own. This also means if I wasn't about to ask you to come true on:

my offer to help with any cross-platform issues still stands.

I would simply withdraw my plugin code proposal altogether, as it is suboptimal even on linux for now, and I don't have the will to do any unpaid development on my Windows install.

jeroen commented 8 years ago

From personal experience I can state that glib on osx/windows is very painful. For one, static linking is completely broken in the stable branch. On OSX I was able to fix this by downgrading to 2.42 and but on windows you are forced to carry around dll files.

MathieuDuponchelle commented 8 years ago

From personal experience I can state that glib on osx/windows is very painful. For one, static linking is completely broken in the stable branch. On OSX I was able to fix this by downgrading to 2.42 and but on windows you are forced to carry around dll files.

Your "for one" simply reiterates what @nwellnhof was saying FWIW

jeroen commented 8 years ago

Your "for one" simply reiterates what @nwellnhof was saying FWIW

This is unrelated to licensing. Applications using librsvg2 actually segfault if glib was statically linked.

gjtorikian commented 8 years ago

I don't even want to get involved in this entire debate, but for what it is worth, I would also be 👎 against GLib. For services that want to run Commonmark (GitHub, Reddit, StackExchange), the idea of introducing any C library is already enough to make our hairs stand on end for security and vulnerability issues. Bringing in a giant library like GLib is almost certainly not going to happen.

I understand that it brings about a lot of functionality and has been tested and is used in many places and a lot of other reasons to make it "stable." But it's software. Software is not stable. Software fails. And I can only speak for GitHub, but the less C that needs to be audited before it can be brought into our stack, the better chance we have of adopting Commonmark quickly.

MathieuDuponchelle commented 8 years ago

@jeroenooms , I understand this thread is pretty long already, but I was referring to this comment

MathieuDuponchelle commented 8 years ago

@gjtorikian regarding integration I'd be more comfortable with introducing a library that uses the GLib than one which implements its own string buffers tbh ;)

jgm commented 8 years ago

I see good arguments on both sides. Some will say that, despite having a bigger code footprint, glib has a security advantage because it's a well-established library that many people use, so it is very well tested and has many eyes on it. Custom code for things like linked lists and string buffers might present a smaller footprint, but it has to be individually audited and tested.

Still, I think the difficulties with glib on Windows, and the general appeal of having a small self-contained code base with no dependencies, inclines me not to go with glib.

+++ Garen Torikian [Apr 14 16 16:25 ]:

I don't even want to get involved in this entire debate, but for what it is worth, I would also be 👎 against GLib. For services that want to run Commonmark (GitHub, Reddit, StackExchange), the idea of introducing any C library is already enough to make our hairs stand on end for security and vulnerability issues. Bringing in a giant library like GLib is almost certainly not going to happen.

MathieuDuponchelle commented 8 years ago

Still, I think the difficulties with glib on Windows

Relative difficulties, I pointed out M-SYS2 at some point, the problem really lies with the "static link and build everything in MSVC" use case, not with Windows itself, and I was saying earlier I think it's regrettable to let downstream bundling practices dictate or influence technical choices.

and the general appeal of having a small self-contained code base with no dependencies

Well I find the idea of a smaller codebase more appealing to be honest :)

inclines me not to go with glib.

your code your rules ¯\_(ツ)_/¯ . This means I certainly won't implement multithreaded inline parsing, but certainly someone with more courage than I will tackle that at some point, not going to fork over that anyway.

MathieuDuponchelle commented 8 years ago

btw while I'm thinking of this, maybe you'd be more amenable to using libcfu?

Pros:

Cons:

I'd like to make it clear this wouldn't be my preferred option, mostly due to the absence of a living upstream, however afaict none of the concerns raised against GLib usage applies to libcfu so there's that.

MathieuDuponchelle commented 8 years ago

In the libcfu scenario, given the BSD license the library could be added as a git submodule and statically linked by cmake's build process itself, with no consequence on downstream's license choices.

gjtorikian commented 8 years ago

If you only need a few data structures take a look at https://troydhanson.github.io/uthash/.

MathieuDuponchelle commented 8 years ago

Yep, I wouldn't mind a bit more than data structures though, in particular portable and non-racy thread pools / queues, and cross-platform plugin discovery / loading, I don't want to write this sort of things from scratch for the nth time.

MathieuDuponchelle commented 8 years ago

Closing this issue, I'll open a different one for libcfu at some point if I tackle multithreading in cmark and it appears libcfu is useful for this.