drahnr / cantaloupe

rpm repository as a service, not ready for anything just yet
0 stars 0 forks source link

RPM metadata library #1

Open dralley opened 3 years ago

dralley commented 3 years ago

I just wanted to mention - I just discovered "repomd" a second ago, but I've been working on a similar library on and off for a couple of months.

https://github.com/dralley/rpmrepo_rs/

It's in a similar state of non-completion, but I'm aiming for feature parity with the createrepo_c libraries. Right now I have full serialization and deserialization for repomd.xml and filelists.xml and the others are in-progress, and structurally it needs a lot of cleanup, proper error handling, better testing, etc. Anyways, I already spoke with the author of another similar uncompleted library and we were considering merging the two.

https://github.com/semtexzv/rpmtools

Would you be interested in doing the same? There's not a lot of sense in duplicating effort.

drahnr commented 3 years ago

@dralley thanks for reaching out! I would very much appreciate a combined effort :) could we extend the license to Apache-2.0 and MIT besides MPL?

dralley commented 3 years ago

Probably, if there is a good reason to? I can't say I have a perfect understanding of all the legal nuances but my perception is that MPLv2 provides all the fancy patent protections and so forth that the Apache 2.0 license provides, plus a few minor copyleft protections, without being a massive headache to work with like the LGPL and similar licenses. Which is why I happen to like it a lot for Rust code.

What are the benefits of MIT + Apache dual license?

drahnr commented 3 years ago

Motivation is mostly it's the defacto rust standard license combo and I have read both, which I did not for the MPL. I don't have a strong opinion about this, as long as the license is compatible with MIT + Apache-2.0 which is the defacto standard for rust projects, so it's mostly a compatibility concern.

drahnr commented 3 years ago

I had a quick peek into your code, but I have to spend some more time and we should talk about your design goals.

dralley commented 3 years ago

Understood. It's certainly less well known simply due to being a younger license.

In terms of design, it's a bit of a mess at the moment, because I started by making a simple application for downloading RPM repositories. And then I expanded to trying to make a createrepo_c clone, but since XML doesn't fit all that well into the serde model, I could never get it to write the XML properly... so now I'm writing manual XML parsing and writing using quick-xml alone. That's where I'm at currently.

I want to split into multiple crates but haven't yet. Obviously it's not great to have library + application code mixed together like this.

dralley commented 3 years ago

It looks like the original reasoning for MIT + Apache was that they needed the patent and trademark protections from the Apache license, but the FSF claims that Apache 2.0 isn't GPLv2 compatible. Dual licensing with MIT solves that problem.

https://internals.rust-lang.org/t/rationale-of-apache-dual-licensing/8952

And then in another thread supposedly Graydon Hoare wanted to stick to well-known licenses, and the MPLv2 was only a couple of months old at the time (2012).

dralley commented 3 years ago

It occurs to me that I should look into the licensing more, anyways.

I've spent a fair number of hours contributing to the createrepo_c project as well (which is the canonical library for manipulating RPM metadata - which is covered by the GPL). My code is completely different in basically every respect, but the GPL FAQ draws a fuzzy border.

https://www.gnu.org/licenses/gpl-faq.html#TranslateCode

It's not exactly clear what "translate" means in a context where the internal structures and patterns are totally different, but still having knowledge of the library works.

drahnr commented 3 years ago

Imho the next steps would be to unify the souce code under one umbrella org and review which parts of which crate are going to make it into a combined repo.

dralley commented 3 years ago

I'm still trying to get some answers regarding the licensing weirdness. That section of the GPL FAQ paints with a very wide brush and while it doesn't look like the text of the license actually justifies it, I'd rather verify. In the meantime we should probably wait before doing any actual merging.

I invited the author of the other library (@semtexzv) to this thread in case he has any thoughts.

In terms of umbrella org, would you be opposed to asking https://github.com/rpm-software-management/ if we can host the repo there? They maintain createrepo_c, and librpm.rs is already hosted there (although I believe development is paused until the librpm C API is made more threadsafe). I work with them on a semi-regular basis, they might be willing to do so.

semtexzv commented 3 years ago

Hey, I'm the author of https://github.com/semtexzv/rpmtools . The library I wrote was just an experiment, but I snagged the crates.io names for a future shared crate. I'd gladly point the rpmrepo to a shared crate for reading/writing the metadata. The reading can be done using serde, but the writing probably not, the serde_xml is not in a great state. It'll probably require using one of the SAX (event based) xml library. As for licensing, The Apache + MIT seems to be the best option in rust ecosystem.

TL;DR: Yeah, let's merge the datatype definitions + some standard, sane serializer / deserializer implementation into a library, license it MIT + Apache and ask it to be hosted under https://github.com/rpm-software-management/ once ready.

dralley commented 3 years ago

The reading can be done using serde, but the writing probably not, the serde_xml is not in a great state. It'll probably require using one of the SAX (event based) xml library.

Yup. That's the route I ended up going down. It's not so bad, really. quick-xml is pretty easy to work with, much moreso than libxml or expat.

I'm also working on a PR upstream to add a higher-level (still manual) API for writing that's even easier & less tedious. https://github.com/tafia/quick-xml/pull/278

dralley commented 3 years ago

I asked about hosting at https://github.com/rpm-software-management/, and they said it can probably be done, however they generally have a strong preference for LGPL and similar licenses. It might be a harder sell.

I have reading + writing XML working now for primary, filelists, other, and repomd, and the API is slightly better than it was before. None of it is really tested yet though, and error handling is mostly nonexistent.

dralley commented 3 years ago

FYI, I'm still working on this, just slowly. Work has ramped up a bit so I took a break for a few weeks.

drahnr commented 3 years ago

Likewise, spare time has been very sparse. I plan to get back on this later this summer.

dralley commented 2 years ago

It's still not quite ready but it's getting close. The main problem is that I ended up needing to make patches to several external libraries though and I have to wait on those to get merged and released, otherwise it can't be built without local clones of those projects.

I've written enough tests to have good confidence that the metadata being generated is correct in the majority of cases and @semtexzv I split it into multiple crates as you suggested.

drahnr commented 2 years ago

@dralley you can use a [patch.crates-io] section with git overrides until upstream slates a new release.

dralley commented 2 years ago

I tried this but it seems that crates.io doesn't allow publishing packages with git repository dependencies

drahnr commented 2 years ago

Uh, yeah, I meant for local development to get things moving faster, rather than publishing crates. Eventually one needs to decide if upstream will ingest the required changes or if a workaround can be found or, worst case, a fork is needed.

drahnr commented 2 years ago

@dralley I created a temporary fork of rpm-rs aka rpm-rs-temporary with a bunch extensions and fixes, once/iff upstream picks up again, it'll be dropped again.

dralley commented 2 years ago

@drahnr I'm probably about 85% done with the metadata writing and parsing aspects, and about 50% done with Python bindings using pyo3.

The main three things that need improvement are tests, error handling, and rpm-rs integration. And waiting on quick-xml to release some patches.

And I guess advisory / errata parsing since that's still nowhere near complete, but it's a little less important. I'm pretty happy with how it looks..

dralley commented 2 years ago

@semtexzv Do you still have any interest in your rpmtools libraries?

drahnr commented 2 years ago

I do. The question here is more about the next steps since you were already elbow deep in refactoring something last time I checked. Happy to pitch in in a few weeks

dralley commented 2 years ago

No I meant @semtexzv, since I noticed he moved from Red Hat to Google and I've heard that Google have weird uptight rules around open source projects. I'd like to try to maintain them, or at least some parts of them, like the repo downloading bit.

I don't know if you (Michal) would be willing to transfer the crate name `rpmrepo' to me? I have a Python tool by the same name that uses createrepo_c, but now that my library is getting ready it is finally feasible to use Rust for the whole thing.

And at some point I would still like to move it all over to the rpm-software-management org, but they're a bit busy right now, and that conversation would be easier once the library is 100% complete for at least a subset of the functionality.