Martinsos / edlib

Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
http://martinsos.github.io/edlib
MIT License
493 stars 162 forks source link

meson.build #161

Closed cdunn2001 closed 3 years ago

cdunn2001 commented 3 years ago

I think Meson is way easier to use than Cmake, so I've set that up for you. I did not add the performance tests.

make setup
make build
make test  # shows how to be verbose, but "ninja -C MESON test" would work too
make a  # temporary; useful for debugging
make install
Martinsos commented 3 years ago

Well this looks very cool, thanks @cdunn2001 for taking this on! Since I don't primarily work in C++, I never took the time to properly learn CMake, so I like the idea of using Meson -> it seems simpler and easier to understand, meaning more people (including me) might have easier time working on it, which could be great.

Some things I would like to ensure though before deiciding to switch to Meson:

  1. Are we losing anything? Is it going to be harder for people using CMake in their own projects to include Edlib as dependency?
  2. Is this new Meson config equally powerful / doing the same things as the existing CMake one? I see in CMakeLists.txt we are taking care of: enabling debug mode if certain flag is set, making sure no compiler-specific features are used (turning off CXX extensions), we ensure edlib is built as static library by default, we force on Win to always compile with W4 and generally ensure that strict warnings are used, we build edlib-aligner only when not on Windows because it does not work on Window. Finally, we recently had this PR where contributor added generation of certain files (edlib-config.cmake and edlib-config-version.cmake) in order to make including Edlib easier in certain scenario (vcpkg in his case) -> what would happen to that, can we support that from Meson?

@SoapZA , as Edlib's CMake expert :D, could you please also give your opinion on this?

SoapZA commented 3 years ago

@Martinsos We don't lose anything with this (and building python modules in Meson is significantly easier than in CMake). Most of the boilerplate stuff (warnings, installations paths etc) is taken care of by Meson. The cmake config file isn't generated, but writing one from Meson isn't super hard.

Martinsos commented 3 years ago

@Martinsos We don't lose anything with this (and building python modules in Meson is significantly easier than in CMake). Most of the boilerplate stuff (warnings, installations paths etc) is taken care of by Meson. The cmake config file isn't generated, but writing one from Meson isn't super hard.

@SoapZA Thanks for quick response! Ok awesome, then I am in for moving to Meson, @cdunn2001 please let's discuss some of the specific points that @SoapZA and I raised / commented on and then we can look into making this work with the CI (Travis and AppVeyor).

cdunn2001 commented 3 years ago

The toughest part is deciding whether to continue to support CMake. The simplification is enormous. E.g. you can delete the edlib.pc.in file. Yes, this simple Meson file does everything all the cmake files do and more. E.g. you could easily build both shared and static libs with a simple change. Debug builds are trivial, and sanitization builds are not hard.

If you drop cmake, you will get complaints from people who like to plop a cmake project into their own. But if you keep it, then it's a lot to maintain.

Personally, I would leave it alone, but switch to meson in TravisCI. That way, cmake support is strictly legacy, and if any user finds a problem, they can submit their own PR to fix it.

For jsoncpp (a simple, legacy library that I ended up supporting when the author ghosted), most tickets are from users with complaints about cmake, and they all seem to disagree with each other, depending on which version of cmake they are using. (Most of the rest are from people who would love to change the API in some way or another, which then causes other problems.) Soza helped me switch that to meson, and I have never regretted it. I immediately approve and merge every cmake PR and otherwise just completely ignore it.

Martinsos commented 3 years ago

The toughest part is deciding whether to continue to support CMake. The simplification is enormous. E.g. you can delete the edlib.pc.in file. Yes, this simple Meson file does everything all the cmake files do and more. E.g. you could easily build both shared and static libs with a simple change. Debug builds are trivial, and sanitization builds are not hard.

If you drop cmake, you will get complaints from people who like to plop a cmake project into their own. But if you keep it, then it's a lot to maintain.

Personally, I would leave it alone, but switch to meson in TravisCI. That way, cmake support is strictly legacy, and if any user finds a problem, they can submit their own PR to fix it.

For jsoncpp (a simple, legacy library that I ended up supporting when the author ghosted), most tickets are from users with complaints about cmake, and they all seem to disagree with each other, depending on which version of cmake they are using. (Most of the rest are from people who would love to change the API in some way or another, which then causes other problems.) Soza helped me switch that to meson, and I have never regretted it. I immediately approve and merge every cmake PR and otherwise just completely ignore it.

Thanks for this analysis, this is great! While I could do as you are doing with jsoncpp, I have to admit I am very tempted to completely drop CMake. I don't want to invest time into maintaining it, and on the other hand, I don't like the idea of having it in the codebase if it is not actively maintained. But I do know that some people are using it to drop the project easily into their CMake config, that is true. Hmmmm. Ok let's leave it for a short while and see.

cdunn2001 commented 3 years ago

re: files()

Soap, those are good points and good to know. (But note that if you forget to add a new file into meson, you will not know it until you run ninja.)

However, I still think it's better to start simple. The current meson.build is very easy to read. There is already an example of files() (needed to find test-data) for future reference if needed.

Martinsos commented 3 years ago

re: files()

Soap, those are good points and good to know. (But note that if you forget to add a new file into meson, you will not know it until you run ninja.)

However, I still think it's better to start simple. The current meson.build is very easy to read. There is already an example of files() (needed to find test-data) for future reference if needed.

While I don't have strong opinion here, I do usually lean toward being more explicit when possible, so I don't mind specifying files() if that can catch more problems in compile time. But I don't think it is crucial, so all is good, whatever you decide.

cdunn2001 commented 3 years ago

Oops. I forgot to install edlib.h.... Dunn.

If you want to install that as edlib/edlib.h, just use

install_headers('edlib/include/edlib.h', subdir='edlib')
cdunn2001 commented 3 years ago

By the way, the reason I am working on this is to wrap this from Nim, for use in the Baylor hackathon this weekend. Nim is a great way to write glue logic for existing libraries -- much better than Python.

Option 1: Wrap an installed library

In this case, I need this meson.build so that I can install it more easily. For now, we would wrap the "extern C" API, which is quite nice.

An example is Brent Peterson's htslib wrapper. He actually runs c2nim on the htslib API includes only that in his nim repo. Then he pulls htslib.so at runtime. That works great if it's actually installed. But static linkage is fine too.

Option 2: Include edlib source-code in a nim repo.

In this case, whenever we build our nim program, we would also compile edlib. So in this case, we would not need meson.build. But we would be forced to use the C++ "backend", which compiles a lot slower. And it means compiling much more code for a fresh build, which is even slower -- not bad with ccache though.

The benefit is easy packaging. People could download one repo and have everything built at once.

I'm not sure yet which way makes more sense. But edlib is a great library, so we definitely want it wrapped.

cdunn2001 commented 3 years ago

That's done, but I might switch to Option 2 anyway, so other folks can use it more easily this weekend. Notice how easy it was to wrap your entire API.

cdunn2001 commented 3 years ago

Ok, the wrapper is in Nim now. How about that? Pretty quick.

It uses Option 1 (calling pkg-config) for now. I have Option 2 (building a copy with C++ backend) on a branch. And on another branch, Option 3 (dlopen() at runtime), but that one is pretty rough since DYLD_LIBRARY_PATH is suppressed for sub-processes on OSX now. We'd probably need a homebrew Formula for that.

Anyway, this meson.build definitely produced a good edlib.pc file and a usable .dylib.

Martinsos commented 3 years ago

@cdunn2001 that is amazing, that you are using it for the hackaton, and that you wrote a Nim wrapper :)! I heard about Nim but haven't got to trying it out, seems interesting though.

I could add a short blurb about nim wrapper on the main README od edlib -> I am not sure what to link too though and how to phrase it, since I don't know much about Nim. Ideally it would be something like "Edlib also has unofficial binding for Nim by @cdunn2001" and "binding for Nim" would be a link -> but to what exactly?

I hope hackaton goes well :)! I will get to merging this PR today / tomorrow.

Btw., a question: as you have correctly noticed, there is new version of Edlib in preparation that is orienting more toward C++, less toward C. Reason is that a lot of people requested support for sequences of anything, and not just char, so we did this with generics. Trying to keep this behind C API could be possible maybe, but is certainly not a great experience, so we focused on C++ API and said that we will add C API on top of it, in some shape, if people will request it. Since you mentioned C API, I thought I could ask you immediately: would you miss if it was gone and you had only C++ API? Do you think a lot of people would miss it?

Martinsos commented 3 years ago

@SoapZA , what are your thoughts on maintaining both CMake and Meson? Are you aligned with @cdunn2001 on this, or do you have a different perspective? I guess one big part of me wants to get rid of CMake (even if I ignore it and others maintain it), but on the other hand what @cdunn2001 is saying makes a lot of sense and it sounds like it will not be very practical for users of Edlib if CMake is gone, so I am fishing for some extra arguments to get rid of it :D. Although with information I have right now, I am slightly more leaning to keeping it :sob: .

cdunn2001 commented 3 years ago

Would you miss if (C API) was gone and you had only C++ API? Do you think a lot of people would miss it?

Yes. If it were gone, I would consider adding it back myself. The reason is that Nim builds much faster with the C backend. (C++ is just slow, ancient technology. I am trolling Soap now.)

But that's not really the point. It's easy to maintain, right? Just keep. Adding API is a minor version bump; removing is a major bump, as in 2.0.0.

Btw, Soap and I agree on just about everything, even tiny details like trailing commas. We mainly disagree on how to get there. I like each change to be specific and easy to understand, one thing at a time, building consensus as we go. He's a really good engineer though.

Martinsos commented 3 years ago

For reference, I did some final integration polishing for Meson PR: bb17ed5e04266a9b87af0d427182e59b64b42d60 . Thanks for all the work with this.

C API: Makes sense! This new version will actually bring Edlib to 2.0.0, and keeping C API will require some work to figure out how to exactly map it to new C++ API, since it will be more expressive. I am sure we can get something working quickly, and that will probably be the first step. The thing is, I would like to evolve the API to be nicer to use by using C++ features, but then it becomes harder and harder to map C API to it -> that is why I was thinking about dropping it, so we are not hindered by it. But, for most bindings, C is simpler to write bindings for, since it is simpler language -> even Python binding that we are maintaining relies on C right now. And I was thinking, ok maybe I can just drop it, figure out how to switch Python binding to C++, and if people need C API, somebody will write it (as you said). But let's see, now I am thinking it might be best to keep the C API with same functionality as now although that will be less than what C++ API will be offering, and then we can gradually consider further iterations. That probably makes most sense. Maintaining a C API should not be hard. Btw bumping to major version -> I get what you mean, I could in theory keep C API same as now and would not need to bump the major version. Although, with C++ API changing, I think I will want to bump it still. But doesn't really matter.

Thanks for the discussion again, and it is fun to see you and @SoapZA brainstorming together, I hope you are having a lot of fun working together ;).

cdunn2001 commented 3 years ago

Since you're going to 2.0.0, I don't think you need to keep your C API binary-compatible. As long as the same basic methods are available, that's enough for me. I'd suggest a header for the C API and another for C++.

But if that's too much work, just go to C++-only. Maybe keep the old code on a branch, or least tagged as a release.

SoapZA commented 3 years ago

@Martinsos If it were up to me, I'd ditch CMake entirely. Overall, it's a terrible buildsystem, with tons of anti-patterns and lots of idiosyncratic patterns and gotchas. It gets every sane default wrong, it's not opinionated, and it shepherds people to over-engineer otherwise simple build systems. That said, like C++, it unfortunately has a lot of momentum, which means people generally use it, not because they like it, because they have a severe case of Stockholm syndrome (like I have for C++ 👋 @cdunn2001). You might get some pushback for ditching it, even though CMake can perfectly well consume Meson produced pkg-config output.

Re the C API: Are there any real reasons not to keep it? How massive is the maintenance burden? Do you want to use all entrypoints with const std::string& arguments (this is where C++17's std::string_view could help a lot, because you can make it take C const char* pointers too)? ABI-wise it shouldn't be hard, and integration with other languages (think maybe Go, Rust etc.) is a lot easier via the C-API/FFI than C++ (which is nigh impossible).

cdunn2001 commented 3 years ago

For the record, Nim can wrap C++. :smile:

But I think a succinct C API is the easiest thing to deal with.

Martinsos commented 3 years ago

Since you're going to 2.0.0, I don't think you need to keep your C API binary-compatible. As long as the same basic methods are available, that's enough for me. I'd suggest a header for the C API and another for C++.

But if that's too much work, just go to C++-only. Maybe keep the old code on a branch, or least tagged as a release.

I agree, I think that is most sensible:

Awesome, thanks :).

Martinsos commented 3 years ago

@Martinsos If it were up to me, I'd ditch CMake entirely. Overall, it's a terrible buildsystem, with tons of anti-patterns and lots of idiosyncratic patterns and gotchas. It gets every sane default wrong, it's not opinionated, and it shepherds people to over-engineer otherwise simple build systems. That said, like C++, it unfortunately has a lot of momentum, which means people generally use it, not because they like it, because they have a severe case of Stockholm syndrome (like I have for C++ wave @cdunn2001). You might get some pushback for ditching it, even though CMake can perfectly well consume Meson produced pkg-config output.

Re the C API: Are there any real reasons not to keep it? How massive is the maintenance burden? Do you want to use all entrypoints with const std::string& arguments (this is where C++17's std::string_view could help a lot, because you can make it take C const char* pointers too)? ABI-wise it shouldn't be hard, and integration with other languages (think maybe Go, Rust etc.) is a lot easier via the C-API/FFI than C++ (which is nigh impossible).

Thanks for pushing me to drop CMake :D! I will leave it for a little while as discussed, but then I might try to get rid of it. C API - you are right, I think it makes sense to have it. I have yet to see how it will map to new C++ API, it might be tricky, but ok, one thing at a time.