Open Richterrettich opened 3 years ago
1. Make `RPMPackageMetadata::parse` and `RPMPackageMetadata::write` public methods and be done with it. This leaves the opportunity to read the `Reader` until the metadata object is created and decide for your self what you want to do with the actual CPIO archive. The downside to this is that we are widening the API surface which might or might not bite us later. Coming back to the original example, the code would look like this:
let metadata = RPMPackageMetadata::parse(reader)?; save_in_database(&metadata)?; metadata.write(out_file)?; io::copy(reader,out_file)?;
hugely in favour of this. The only extension I would add, is introducing an RPMPackageArchive
which can parse the remaining data and verify (not uncompress) that it's actually an archive (not sure if that makes sense though).
Those two could then be combined (if desired) into RPMPackage::parse() { .. }
which internally is nothing more than the sequence of RPMPackageHeader
+ RPMArchive
.
Not sure how signing fits into this though, at what point is the signature checked? And if the two signatures are going to be checked separately?
A great advantage would be that this approach could easily be modeled as Future
+ Stream
impls where the header parsing future impls Future<Item=(RPMPackageHeader, ArchiveStream)>
, where the archive stream impls Stream<Item=FileEntry>
.
But this also leaves the question when to check the signature.
Not sure if we need random access to individual files, or how doable that would be.
1. Change the data type of `RPMPackage::content` from `Vec<u8>` to a generic reader. Something like this:
RPMPackage <T: io::BufRead> { //... content: T }
This could be a simple cursor when creating the RPM. The issue I see with this is the confusing semantics in combination with
RPMPackage::sign
andRPMPackage::verify
since both of them would need to consume the reader without any obvious indication. Since most network requests are notSeek
able, it would not be good to narrowT
toRead + Seek
in general. One way around this could be to create a specialimpl
forRead + Seek
that features bothsign
andverify
methods and leave them out otherwise.
I am not so favourable with this, it hides quite a bit and it becomes non obvious when signatures are verified, which is a key in trust and processing chain and should be very explicit. To make this workable, one would also require a Clone
bound on T
to be able to handle signatures.
To make sure we have the same discussion basis (and also for general documentation purposes of RPM), it might make sense to have a common a (tikz
?) graphic to include in the documentation? What do you think to see the dependencies of which data depends on what and the implicit order constraints?
I am also in favour of the first solution since it is the simplest. Creating an extra type for the archive itself could be beneficial.
The signature can only be checked if all bytes have been processed since the signature always spans both over metadata and archive. To make this process efficient we would need to ensure that the reader is only processed once and buffers are kept only for a processing window. Something like this (just a sketch)
let metadata = RPMPackageMetadata::parse(input)?;
//input has now advanced to the archive
metadata.write(out)?;
// the tee ensures that processed bytes are written immediately
let tee = tee(out,input);
let verifier = Verifier::new(key)?;
if let Err(e) = verifier.verify_reader(&metadata,tee) {
// undo out here
}
This way, all bytes processed during the verification process are directly written to the final destination. If the verification fails, this has to be reverted, but this is a classic case of
it is easier to ask forgiveness than it is to get permission
Maybe we could provide a function to remove some of the boilerplate involved:
let metadata = rpm::RPMPackageMetadata::parse(input)?;
let out = create_out_based_on_metadata(&metadata)?; // not part of rpm-rs
let verifier = rpm::signature::PGPVerifier::new(key)?;
if let Err(e) = rpm::process_and_verify(metadata,input,out,verifier)?; {
// undo out here
}
This leaves only the problem that the metadata itself might be large enough to cause OOM problems. But IMO this is more a theoretical problem. Also it is good practice to to check/limit the size of untrusted or unknown streams.
Not sure there is really a point in resetting the out
part, if it's not rpm, we would dump it anyways?
let (metadata, input) = rpm::RPMPackageMetadata::parse(input)?;
let verifier = rpm::signature::PGPVerifier::new(key)?;
let archive = rpm::RPMArchive::parse_and_verify(&metadata, input, verifier)?;
// or
let archive = rpm::RPMArchive::from_metadata(&metadata, input)?;
archive.verify(&metadata,verifier)?;
// or
let package = rpm::RPMPackage::from_metadata(metadata, input)?;
package.verify(verifier)?;
// or all in one
let pkg = rpm::RPMPackage::parse(input)?;
pkg.verify(verifier)?;
// combine parsed
let pkg = rpm::RPMPackage::combine(metadata, archive);
pkg.verify(verifier)?;
I would add a limiter configuration struct, to prevent oversized rpms.
Not sure there is really a point in resetting the out part, if it's not rpm, we would dump it anyways?
Creating and cleaning out
is the responsibility of library users.
In case a tee
reader gets used, out
is completely processed by the time we are finished with the verification. For example, if out
is a file, it will be persisted regardless of the outcome of verify.
Once again, not our problem directly since it is the duty of library users to manage out
. But the API has to be clear that this is happening and that it is indeed the users responsibility.
I am sure that library users will often need some information found in the metadata
object to create out in the first place. For example:
let header = metadata.header;
let rpm_path = format!("{}-{}-{}.rpm",header.get_name()?,header.get_version()?,header.get_revision()?);
let out = fs::create(rpm_path)?
Sadly, I do not see another solution than using a tee
reader since we need to go through the input
multiple times without any option to reset it. The only option is to do all operations (verifying and writing) during the first process of input
.
This is in essence what makes the API design a little awkward since RPM is awkward in this regard. We need to process input
to extract the header in order to get the necessary information about signatures. But to verify signatures efficiently we need an unprocessed input
. This was clearly designed with files in mind where it is easy to reset input
without any problems. This is not possible with network requests though.
So a function like this:
rpm::process_and_verify(metadata,input,out,verifier)?;
// or
rpm::write_and_verify(metadata,input,out,verifier)?;
// or whatever name may fit better for this.
is our best bet to make this happen without unnecessary overhead.
I have some problems with the proposed API:
let archive = rpm::RPMArchive::parse_and_verify(&metadata, input, verifier)?;
and
archive.verify(&metadata,verifier)?;
give the impression that the archive gets verified individually and to do so, it needs a metadata
object and a verifier
. But this is not how the RPM signing and verification process works. An RPM is always verified as a whole, meaning metadata and CPIO archive together.
So it would make more sense to go the indirection of RPMPackage
for the verification API in case all content is read to memory:
let (metadata, input) = rpm::RPMPackageMetadata::parse(input)?;
let archive = rpm::RPMArchive::parse(metadata,input)?; // at this point, the content is read to memory anyway
let verifier = rpm::signature::PGPVerifier::new(key)?;
let pkg = rpm::RPMPackage::new(metadata,archive);
pkg.verify(verifier)?;
which is basically what we have at the moment but with more checks regarding the archive itself and more control over the individual steps of processing.
Okay, I just thought about how to make this more barable.
We could introduce a type called RPMProcessor
that is responsible for processing the RPM. This type can have multiple destinations and verifiers to make it completely configurable what happens during processing of input
.
pub trait ProcessVerifier: io::Writer {
fn verify(metadata: &RPMPacakgeMetadata) -> Result<(),RPMError> // gets called when input is completely consumed
}
impl ProcessVerifier for PGPVerifier {
//...
}
let pgp_verifier = rpm::signature::PGPVerifier(key);
let custom_verifier = my_custom_verifier();
let processor = RPMProcessor::new(metadata,input)
.add_verifier(pgp_verifier)
.add_verifier(custom_verifier)
.add_destination(out)
.add_destination(another_location);
if let Err(e) = processor.process() {
// handle error accordingly
}
creating a custom ProcessVerifier
would be easy since you need only wrap an io::Writer
.
I think that's the most ergonomic API so far, lets go for it! The only thing I do not quite like is that a lot of details are hidden and now we require another trait
for Signer implementations.
Also note that iirc the pgp could not handle this, since it does not have a stream processing API and would make the whole effort moot, so it might make sense to file a PR to rpgp
- which should be not too much work.
Okay, I'll implement this.
Also note that iirc the pgp could not handle this, since it does not have a stream processing API and would make the whole effort moot, so it might make sense to file a PR to rpgp - which should be not too much work.
Yeah, this stinks but you are right. Do you have time to create a PR for rgpg? I am not quite sure why this is not higher on their priority list since this is IMHO a core feature to make rgpg viable for server deployments.
Okay, I'll implement this.
Also note that iirc the pgp could not handle this, since it does not have a stream processing API and would make the whole effort moot, so it might make sense to file a PR to rpgp - which should be not too much work.
Yeah, this stinks but you are right. Do you have time to create a PR for rgpg? I am not quite sure why this is not higher on their priority list since this is IMHO a core feature to make rgpg viable for server deployments.
I will, but I it'll be probably a week until I get around to it
https://github.com/rpgp/rpgp/pull/106 first attempt to introduce the trait io::Read
API
Currently, you have to read the complete RPM into memory in order to access its metadata. I've encountered some workflows where the package content is completely irrelevant for further processing. Something like this:
In this case, reading the complete body in a byte buffer is a waste of memory. Not only that, since RPM's can potentially be multiple gigabyte in size, this also limits the usage of this library in memory sensitive deployments.
I currently see two solutions for this problem:
Make
RPMPackageMetadata::parse
andRPMPackageMetadata::write
public methods and be done with it. This leaves the opportunity to read theReader
until the metadata object is created and decide for your self what you want to do with the actual CPIO archive. The downside to this is that we are widening the API surface which might or might not bite us later. Coming back to the original example, the code would look like this:Change the data type of
RPMPackage::content
fromVec<u8>
to a generic reader. Something like this:This could be a simple cursor when creating the RPM. The issue I see with this is the confusing semantics in combination with
RPMPackage::sign
andRPMPackage::verify
since both of them would need to consume the reader without any obvious indication. Since most network requests are notSeek
able, it would not be good to narrowT
toRead + Seek
in general. One way around this could be to create a specialimpl
forRead + Seek
that features bothsign
andverify
methods and leave them out otherwise.@drahnr , do you have any thoughts on this?