Drawaes commented 7 years ago

Rationale

There is a general need for a number of ciphers for encryption. Todays mix of interfaces and classes has become a little disjointed. Also there is no support for AEAD style ciphers as they need the ability to provide extra authentication information. The current designs are also prone to allocation and these are hard to avoid due to the returning arrays.

Proposed API

A general purpose abstract base class that will be implemented by concrete classes. This will allow for expansion and also by having a class rather than static methods we have the ability to make extension methods as well as hold state between calls. The API should allow for recycling of the class to allow for lower allocations (not needing a new instance each time, and to catch say unmanaged keys). Due to the often unmanaged nature of resources that are tracked the class should implement IDisposable

public abstract class Cipher : IDisposable
{
    public virtual int TagSize { get; }
    public virtual int IVSize { get; }
    public virtual int BlockSize { get; }
    public virtual bool SupportsAssociatedData { get; }

    public abstract void Init(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv);
    public abstract void Init(ReadOnlySpan<byte> iv);
    public abstract int Update(ReadOnlySpan<byte> input, Span<byte> output);
    public abstract int Finish(ReadOnlySpan<byte> input, Span<byte> output);
    public abstract void AddAssociatedData(ReadOnlySpan<byte> associatedData);
    public abstract int GetTag(Span<byte> span);
    public abstract void SetTag(ReadOnlySpan<byte> tagSpan);
}

Example Usage

(the input/output source is a mythical span based stream like IO source)

using (var cipher = new AesGcmCipher(bitsize: 256))
{
    cipher.Init(myKey, nonce);
    while (!inputSource.EOF)
    {
        var inputSpan = inputSource.ReadSpan(cipher.BlockSize);
        cipher.Update(inputSpan);
        outputSource.Write(inputSpan);
    }
    cipher.AddAssociatedData(extraInformation);
    cipher.Finish(finalBlockData);
    cipher.GetTag(tagData);
}

API Behaviour

If get tag is called before finish a [exception type?] should be thrown and the internal state should be set to invalid
If the tag is invalid on finish for decrypt it should be an exception thrown
Once finished is called, a call to anything other than one of the Init methods will throw
Once Init is called, a second call without "finishing" will throw
If the type expects an key supplied (a straight "new'd" up instance) if the Initial "Init" call only has an IV it will throw
If the type was generated say from a store based key and you attempt to change the key via Init and not just the IV it will throw
If get tag is not called before dispose or Init should an exception be thrown? To stop the user not collecting the tag by accident?

Reference dotnet/corefx#7023

Updates

Changed nonce to IV.
Added behaviour section
Removed the single input/output span cases from finish and update, they can just be extension methods
Changed a number of spans to readonlyspan as suggested by @bartonjs
Removed Reset, Init with IV should be used instead

sdrapkin commented 7 years ago

Definitely bytes - not bits. Nonce provider that's not key-aware is a big mistake.

bartonjs commented 7 years ago

Nonce provider that's not key-aware is a big mistake.

You can write your nonce provider however you like. We aren't providing any.

sdrapkin commented 7 years ago

What about deterministic cleanup/IDisposable ?

bartonjs commented 7 years ago

What about deterministic cleanup/IDisposable ?

Good call. Added it to AuthenticatedEncryptor/AuthenticatedDecryptor. I don't think they should probe for disposability on the nonce provider, the caller can just stack the using statements.

sdrapkin commented 7 years ago

INonceProvider concept/purpose makes no sense to me (echoing others). Let primitives be primitive - pass in the nonce the same way you pass in the key (ie. as bytes - however declared). No AE/AEAD spec forces an algorithm for how nonces are generated/derived - this is a higher-layer responsibility (at least in the let-primitives-be-primitive model).

SidShetye commented 7 years ago

No streaming? Really? What is the justification to forcibly remove streaming from a stream cipher like AES-GCM at a core foundational level?

For example, what does your crypto board recommend these two recent scenarios we reviewed?

Client has large healthcare files between 10-30GB. The core only sees a data stream though between two machines so it's one pass stream. Obviously a fresh key is issued for each 10GB file but you've just rendered every such workflow useless. You now want us to a) buffer that data (memory, no pipe-lining) b) perform encryption (all machines in the pipeline are now idle!) c) write the data out (first byte written after a and b are 100% done) ? Please tell me you're joking. You guys are knowingly putting "encryption is a burden" back into the game.
Physical security unit has multiple 4K streams which are also encrypted for at-rest scenarios. Fresh key issuance happens at 15GB boundary. You propose buffering the entire clip?

I don't see any input from the community, of people actually building real-world software, asking to remove streaming support. But then the team disappears from the community dialog, huddles internally and then comes back with something nobody asked for, something that kills real applications and re-enforces that "encryption is slow and expensive, skip it?"

You can provide Encrypt and EncryptFinal which would support both options instead of imposing your decision for the entire ecosystem.

Elegant design eliminates complexity, not control.

bartonjs commented 7 years ago

What is the justification to forcibly remove streaming from a stream cipher like AES-GCM at a core foundational level?

I think it was something like

This proposal eliminates data streaming. We don't really have a lot of flexibility on that point. Real-world need (low) combined with the associated risks (extremely high for GCM) or impossibility thereof (CCM) means it's just gone.

GCM has too many oops moments where it allows key recovery. If an attacker can do a chosen ciphertext and watch the streaming output from before tag verification, they can recover the key. (Or so one of the cryptanalysts tells me). Effectively, if any GCM-processed data is observable at any point before tag verification then the key is compromised.

I'm pretty sure that the Crypto Board would recommend NOT using GCM for first scenario, but rather CBC+HMAC.

If your second scenario is 4k framing, and you're encrypting each 4k frame, then that works with this model. Each 4k + nonce + tag frame gets decrypted and verified before you get the bytes back, so you never leak the keystream / key.

ektrah commented 7 years ago

For comparison: I'm currently developing this "let primitives be primitive" crypto API. Here is my class for authenticated encryption.

For me it turned out to be useful to be able to talk about an crypto primitive independently of a key. For example, I often want to plug a specific primitive into a method that works with any AEAD algorithm and leave the generation of keys etc. to that method. Therefore there's an AeadAlgorithm class and a separate Key class.

Another very useful thing that already prevented several bugs is to use distinct types to represent data of different shapes, e.g., a Key and a Nonce, instead of using a plain byte[] or Span<byte> for everything.

AeadAlgorithm API (click to expand)

```csharp public abstract class AeadAlgorithm : Algorithm { public int KeySize { get; } public int NonceSize { get; } public int TagSize { get; } public byte[] Decrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan ciphertext) public void Decrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan ciphertext, Span plaintext) public byte[] Encrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan plaintext) public void Encrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan plaintext, Span ciphertext) public bool TryDecrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan ciphertext, out byte[] plaintext) public bool TryDecrypt( Key key, Nonce nonce, ReadOnlySpan associatedData, ReadOnlySpan ciphertext, Span plaintext) } ```

Drawaes commented 7 years ago

@bartonjs he/she is correct you need to rely on the program not outputting until authentication. So for example if you aren't authenticating (or just not yet) you can control the input for a block and therefore know the output and work backwards from there...

E.g. a man in the middle attack can inject known blocks into a cbc stream and perform a classic bit flipping attack.

Not sure how to solve the large chunks of data issue really other thank to chunk them with serial nonces or similar... ala TLS.

Drawaes commented 7 years ago

Well let me rephrase that I do but only in the network small sizes case which isn't enough for a general purpose lib.

sdrapkin commented 7 years ago

In the spirit of openness, is it possible to reveal who is on the Microsoft Cryptography Review Board (and ideally the comments/opinions of specific members that reviewed this topic)? Brian LaMacchia and who else?

sdrapkin commented 7 years ago

using reverse psychology:

I'm happy that streaming AEAD is out. This means that Inferno continues to be the only practical CryptoStream-based streaming AEAD for the average Joe. Thank you MS Crypto Review Board!

sdrapkin commented 7 years ago

Building on @ektrah's comment, his (her?) approach is driven by RFC 5116, which I've referenced earlier. There are many notable quotes in RFC 5116:

3.1. Requirements on Nonce Generation It is essential for security that the nonces be constructed in a manner that respects the requirement that each nonce value be distinct for each invocation of the authenticated encryption operation, for any fixed value of the key. ...

Requirements on AEAD Algorithm Specifications Each AEAD algorithm MUST accept any nonce with a length between N_MIN and N_MAX octets, inclusive, where the values of N_MIN and N_MAX are specific to that algorithm. The values of N_MAX and N_MIN MAY be equal. Each algorithm SHOULD accept a nonce with a length of twelve (12) octets. Randomized or stateful algorithms, which are described below, MAY have an N_MAX value of zero. ... An Authenticated Encryption algorithm MAY incorporate or make use of a random source, e.g., for the generation of an internal initialization vector that is incorporated into the ciphertext output. An AEAD algorithm of this sort is called randomized; though note that only encryption is random, and decryption is always deterministic. A randomized algorithm MAY have a value of N_MAX that is equal to zero.

An Authenticated Encryption algorithm MAY incorporate internal state information that is maintained between invocations of the encrypt operation, e.g., to allow for the construction of distinct values that are used as internal nonces by the algorithm. An AEAD algorithm of this sort is called stateful. This method could be used by an algorithm to provide good security even when the application inputs zero-length nonces. A stateful algorithm MAY have a value of N_MAX that is equal to zero.

One idea potentially worth exploring is the passing of zero-length/null Nonce, which might even be the default. The passing of such "special" Nonce value will randomize the actual Nonce value, which will be available as Encrypt's output.

If INonceProvider stays because "reasons", another idea is to add a Reset() call, which will be triggered every time the AuthenticatedEncryptor is rekey'ed. If, on the other hand, the plan is to never rekey AuthenticatedEncryptor instances, this will trash GC if we want to build a streaming chunk-encrypting API (ex. chunk = network packet), and every chunk must be encrypted with a different key (ex. Netflix MSL protocol, Inferno, others). Especially for parallel enc/dec operations where we'd want to maintain a pool of AEAD engines, and borrow instances from that pool to do enc/dec. Let's give GC some love :)

ektrah commented 7 years ago

From my point of view the sole purpose of crypto primitives is to implement well-designed higher-level security protocols. Every such protocol insists on generating nonces in their own way. For example:

TLS 1.2 follows the recommendations of RFC 5116 and concatenates a 4-byte IV with an 8-byte counter,
TLS 1.3 xor's an 8-byte counter padded to 12 bytes with a 12-byte IV,
Noise uses an 8-byte counter padded to 12 bytes in big-endian byte order for AES-GCM and an 8-byte counter padded to 12 bytes in little-endian byte order for ChaCha/Poly.

GCM is way too brittle for randomized nonces at typical nonce sizes (96 bit). And I'm not aware of any security protocol that actually supports randomized nonces.

There is not much demand for more APIs providing crypto primitives. 99.9% of developers need high-level recipes for security-related scenarios: storing a password in a database, encrypting a file at rest, securely transferring a software update, etc.

However, APIs for such high-level recipes are rare. The only APIs available are often only HTTPS and the crypto primitives, which forces developers to roll their own security protocols. IMO the solution is not to put a lot of effort in designing APIs for working with primitives. It's APIs for high-level recipes.

morganbr commented 7 years ago

Thanks for the feedback, everyone! A couple of questions:

While streaming decryption can fail catastrophically, streaming encryption could be doable. Does streaming encryption (along with a non-streaming option) but only non-streaming decryption sound more useful? If yes, there are a couple of problems to solve: a. Some algorithms (CCM, SIV) don't actually support streaming. Should we put streaming encryption on the base class and buffer streamed inputs or throw from the derived classes? b. Streaming AAD likely isn't possible due to implementation constraints, but different algorithms need it at different times (some need it at the beginning, some don't need it until the end). Should we require it up-front or have a method for adding it that works when the individual algorithms allow?
We're open to improvements to INonceProvider as long as the point is that users need to write code generating a new nonce. Does anyone have another proposed shape for it?

Drawaes commented 7 years ago

1 . a = I think it could be an issue not to warn the user early. Imagine the scenario from someone above, a 10gb file. They think they are getting streaming, then sometime later another dev changes the cipher and next thing the code is buffering 10gb (or trying) before returning a value.

b = Again with the "streaming" or networking idea, for instance AES GCM etc you don't get the AAD information until the end for decryption. As for Encryption, I am yet to see a case where you don't have the data upfront. So I would say at least for encryption you should require it at the start, decryption is more complex.
I think it's really a non issue, supplying the "bytes" for the nonce through an interface or just directly is neither here nor there. You can achieve the same thing both ways, I just find it uglier for a primitive but am not vehemently opposed if it makes people sleep better at night. I would just strike this off as a done deal, and move on with the other issues.

SidShetye commented 7 years ago

Regarding the deliberation process

@bartonjs: We could argue all day if closed door decisions devoid of community involvement is an effective justification but we'll go off-topic, so I'll let that be. Plus without richer face to face or realtime comms, I don't want to upset anyone there.

Regarding streaming

1. the 'streaming implies no AES-GCM security' argument

Specifically, steaming => return decrypted data to caller before tag verification => no security. This isn't sound. @bartonjs claims 'chosen ciphertext => watch output => recover key' while @drawaes claims 'control input for a block => therefore know output => "work from there" '

Well, in AES-GCM, the only thing the tag does, is integrity verification (tamper protection). It has 0 impact on privacy. In fact, if you remove the GCM/GHASH tag processing from AES-GCM, you simply get AES-CTR mode. It's this construct that handles the privacy aspect. And CTR is malleable to bit flips but isn't "broken" in any of the ways you two are asserting (recovering key or plaintext) because that would mean the fundamental AES primitive is compromised. If your cryptanalyst (who is it?) knows something the rest of us don't know, he/she should be publishing it. The only thing possible is, an attacked can flip bit N and know that bit N of the plaintext was flipped - but they never know what the actual plaintext is.

So

1) plaintext privacy is always enforced 2) integrity verification is simply deferred (till end of stream) and 3) no key is ever compromised.

For products and systems where streaming is foundational, you can now at least engineered a tradeoff where one momentarily steps down from AEAD to regular AES encryption - then step up back to AEAD upon tag verification. That unlocks several innovative concepts to embrace security instead of going "You want to buffer all that - are you crazy? We can't do encryption!".

All because you want to implement just EncryptFinal rather than both Encrypt and EncryptFinal (or equivalents).

2. Not specific to GCM!

Now AES-GCM isn't some magical beast to have "oops moments" galore. It's simply AES-CTR + GHASH (a sort of hash if I may). Nonce considerations related to privacy are inherited from CTR mode and tag considerations related to integrity come from the variable tag sizes allowed in the spec. Still, AES-CTR + GHASH is very similar to something like AES-CBC + HMAC-SHA256 in that the first algorithm handles privacy and the second handles integrity. In AES-CBC + HMAC-SHA256, bit flips in ciphertext will corrupt corresponding block in decrypted text (unlike CTR) AND also deterministically flip bits in the following decrypted plaintext block (like CTR). Again, an attacker won't know what the resulting plaintext will be - just that bits were flipped (like CTR). Finally, the integrity check (HMAC-SHA256) will catch it. But only processing the last byte (like GHASH).

So if your argument of holding back ALL decrypted data until integrity is OK is truly good - it should be applied consistently. So ALL data coming out of the AES-CBC path should also be buffered (internally by the library) till HMAC-SHA256 passes. That basically means on .NET, no streaming data can even benefit from AEAD advances. .NET forces streaming data to downgrade. To pick between no encryption or regular encryption. No AEAD. Where buffering is technically impractical, architects should at least have the option to warn end-users that "drone footage may be corrupt" rather than "no eyes for you".

3. It's the best we have

Data is getting larger and security needs to be stronger. Streaming is also a reality designers have to embrace. Until the world crafts a truly integrated AEAD algorithm which can natively detect corruption mid-stream tampering, we are stuck with encryption + authentication as bolted-on-buddies. True AEAD primitive are being researched but we've just got encryption + authentication for now.

I care less about "AES-GCM" as much as I care about a fast, popular AEAD algorithm that can support streaming workloads - super prevalent in a data-rich, hyper-connected world.

4. Use AES-CBC-HMAC, Use (insert workaround)

the Crypto Board would recommend NOT using GCM for first scenario, but rather CBC+HMAC.

Leaving aside everything mentioned above or even the specifics of the scenario - suggesting AES-CBC-HMAC isn't free. It's ~3x slower than AES-GCM since AES-CBC encrypt is non-parallelizable and since GHASH can be accelerated via the PCLMULQDQ instruction. So if you're at 1GB/sec with AES-GCM, you're now going to hit ~300MB/sec with AES-CBC-HMAC. This again perpetrates the "Crypto slows you down, skip it" mindset - one that security folks try hard to fight.

encrypting each 4k frame

video codecs should suddenly do encryption? Or the encryption layer must now understand video codecs? It's just a bitstream at the data security layer. The fact that it's a video/genomic data/images/proprietary formats etc shouldn't be a security layer concern. An overall solution shouldn't co-mingle core responsibilities.

Nonce

NIST allows for randomized IVs for length exceeding 96 bits. See section 8.2.2 at NIST 800-38D. Nothing new here, nonce requirements come from CTR mode. Which is also fairly standard across most stream ciphers. I don't understand the sudden fear towards nonces - it's always been number used once. Still, so while the INonce debate makes for a clunky interface at least it doesn't eliminate innovation like the no-stream-for-you imposition. I'll concede to INonce any day if we can get the AEAD security + streaming workload innovations. I hate calling something basic like streaming an innovation - but that's where I fear we would regress.

I'd love to be proven wrong

I'm just a guy who after a long day at work, gave up movie night with my kids to type this. I'm tired and could be wrong. But at least have an open fact based community dialog rather than anecdotes or "committee reasons" or some other voodoo. I'm in the business of promoting secure .NET and Azure innovations. I think we've got aligned goals.

Speaking of community dialog ...

Can we please have a community Skype call? Expressing a complex topic like this blows into a giant wall of text. Pretty please?

sdrapkin commented 7 years ago

Please don't do a Skype call - that's the very definition of "closed door meeting", with no records available for the community. Github issues are the right vehicle for all parties to have a civil documented discourse (ignoring MS-comment-removal precedents).

MS Crypto Review Board probably did a Skype call too. It's not the fault of the MS folks participating in this thread - they likely have very limited access to & persuasion power over the ivory towers of MS Crypto Review Board (whatever/whoever that is).

Regarding streaming AEAD:

Byte-size streaming encryption is possible for MAC-last modes like GCM, CTR+HMAC, but not possible for MAC-first modes like CCM. Byte-size streaming decryption is fundamentally leaking and therefore is not considered by anyone. Block-size streaming encryption is also possible for CBC+HMAC, but that does not change anything. Ie. Byte-size or Block-size approaches to streaming AEAD are flawed.

Chunk-size streaming encryption and decryption work great, but they have 2 constraints:

they require buffering (beyond-block-size). This can be done by the library/API if buffering is controlled/capped (ex. Inferno), or left to the upper layer (calling layer) to deal with. Either way works.
Chunked streaming AEAD is not standardized. Ex. nacl-stream, Inferno, MS-own DataProtection, make-your-own.

This is just a summary of what everyone in this discussion so far already knows.

morganbr commented 7 years ago

@sdrapkin, to make sure I understand properly, are you ok with this API providing streaming encryption, but no streaming decryption?

SidShetye commented 7 years ago

@sdrapkin well, humans brainstorming in real time is certainly beneficial, record keeping concerns can be resolved with meeting minutes. Back to the technical side, while chunking works for streaming decryption, that's not a low level security primitive. It's a custom protocol. And a non-standard one like you noted.

sdrapkin commented 7 years ago

@morganbr

are you ok with this API providing streaming encryption, but no streaming decryption?

No, I'm not. If such API were available, it would be easy to create a stream-encrypted ciphertext of a size that no buffer-decryption will be able to decrypt (out of memory).

Drawaes commented 7 years ago

^^^^ This, there hasn't been much agreement so far, but I think we can all agree whatever way it goes an asymmetric API would be a disaster. Both from a "hey were is the stream decrypt methods I thought I would rely on because there were encrypt methods", and because of @sdrapkin comments above.

sdrapkin commented 7 years ago

@Drawaes Agreed. Asymmetric enc/dec API would be awful.

SidShetye commented 7 years ago

Any updates folks?

bartonjs commented 7 years ago

Apparently I conflated a few attacks.

Inherent weaknesses in stream ciphers (which AES-CTR and AES-GCM are) allow for chosen ciphertext attacks to allow for arbitrary plaintext recovery. The defense against chosen ciphertext attacks is authentication; so AES-GCM is immune... unless you're doing streaming decryption and you can identify from side-channel observations what the plaintext would have been. For example, if the decrypted data is being processed as XML it'll fail very quickly if characters other than whitespace or \< are at the beginning of the decrypted data. So that's "streaming decryption re-introduces concerns with stream cipher design" (which, you might have noticed, .NET does not have any of).

While looking for where the key recovery was coming from there are papers like Authentication weaknesses in GCM (Ferguson/Microsoft), but that one is recovering the authentication key based on short tag sizes (which is part of why the Windows implementation only allows 96-bit tags). I was probably advised about other authentication key recovery vectors as to why streaming GCM is dangerous.

In an earlier comment @sdrapkin noted "Byte-size streaming decryption is fundamentally leaking and therefore is not considered by anyone. ... Byte-size or Block-size approaches to streaming AEAD are flawed.". That, combined with CCM (and SIV) not being capable of doing streaming encryption and the comment of it would be weird to have one streaming and not the other, suggests that we're back to the proposal of just having one-shot encrypt and decrypt.

So it seems we're right back at my last API proposal (https://github.com/dotnet/corefx/issues/23629#issuecomment-329202845). Unless there are other outstanding issues that I managed to forget while taking some time off.

SidShetye commented 7 years ago

Welcome back @bartonjs

I'm going to sleep shortly but briefly:

We've conflated protocol design with primitive design before on this thread. I'll just say that chosen ciphertext attacks is a protocol design concern, not a primitive concern.
Streaming AEAD decryption at least allows you to have privacy and then immediately upgrades to privacy + authenticity at last byte. Without streaming support on AEAD (i.e. traditional privacy only), you're permanently restricting folks to a lower, privacy only assurance.

If technical merits are insufficient or you're (rightfully) skeptical of the authoritativeness of my arguments, I'll try the outside authority route. You should know that your actual underlying implementation supports AEAD (including AES GCM) in streaming mode. The Windows core OS (bcrypt) allows for streaming GCM via the BCryptEncrypt or BCryptDecrypt functions. See dwFlags there. Or an user code example. Or a Microsoft authored CLR wrapper. Or that the implementation has been NIST FIP-140-2 certified as recently as earlier this year. Or that both Microsoft and NIST both spent significant resources around the AES implementation and certified it here and here. And despite all of this, nobody has faulted the primitives. It makes no sense at all for .NET Core to suddenly come around and impose it's own crypto-thesis to water down the powerful underlying implementation. Especially when BOTH streaming and one-shot can be supported simultaneously, very trivially.

More? Well, the above it true for OpenSSL, even with their 'newer' evp APIs.

And it's true for BouncyCastle.

And it's true with Java Cryptography Architecture.

cheers! Sid

Drawaes commented 7 years ago

@sidshetye ++10 if the cryptoboard is so concerned why do they let windows CNG do this?

sdrapkin commented 7 years ago

If you check Microsoft's NIST FIPS-140-2 AES validation (ex. # 4064), you'll notice the following:

AES-GCM:

Plain Text Lengths: 0, 8, 1016, 1024
AAD Lengths: 0, 8, 1016, 1024

AES-CCM:

Plain Text Length: 0-32
AAD Length: 0-65536

There is no validation for streaming. I'm not even sure whether NIST checks that ex. AES-GCM implementation should not be allowed to encrypt more than 64Gb plaintext (another ridiculous limitation of GCM).

Drawaes commented 7 years ago

I am not massively wedded to streaming as my use shouldn't ride over 16k however fragmented buffers would be nice and should pose no risk at all ( I actually suspect that cng made it's interface the way it is for exactly that purpose) ... e.g. I want to be able to pass in a number of spans or similar (linked list for instance) and have it decrypt in one go. If it decrypts to a contiguous buffer that's all fine.

So I guess moving the shadowy crypto board on the "streaming style" API is a no go for now so let's move forward make a one shot API. There is always scope to expand an API IF enough people show a need later

SidShetye commented 7 years ago

@sdrapkin the point is that it's the streaming API that's gone through extensive review by NIST Labs and MSFT. Each build being validated is between $80,000 - $50,000 and MSFT (and OpenSSL and Oracle and other crypto heavyweights) have invested HEAVILY in getting these API and implementations validated for over 10 years. Lets not get distracted by the test plan's specific plain-text sizes because I'm confident .NET will support sizes other than 0, 8, 1016, 1024 regardless of streaming or one-shot. The point is all those battle-tested APIs (literally; on weapon support systems), on all these platforms support streaming AEAD at the crypto-primitive API level. Unfortunately, every argument so far against it has been an application or protocol level concern cited as a pseudo-concern at the crypto primitive level.

I'm all for 'let the best idea win' but unless the .net core crypto team (MSFT or community) has some ground breaking discovery, I just don't see how everyone doing crypto so far, from all different organizations are wrong and they are right.

PS: I know we're in disagreement here but we all want what's best for the platform and it's customers.

SidShetye commented 7 years ago

@Drawaes unless the AEAD interface (not necessarily implementation) being defined today supports a streaming API surface, I don't see how folks can extend it without having two interfaces or custom interfaces. That would be a disaster. I'm hoping this discussion leads to an interface that's future proof (or very least, mirrors other AEAD interfaces that have been around for many years!).

Drawaes commented 7 years ago

I tend to agree. But this issue is going nowhere fast and when that happens we are likely to hit a crunch point either it won't make it for 2.1 or it will have to be rammed through with no time left to iron out issues. I'll be honest I have gone back to my old wrappers and am just revamping them for 2.0 ;)

SidShetye commented 7 years ago

We've got a few reference APIs for Java, OpenSSL or C# Bouncy Castle or CLR Security. Frankly any of them will do and long term, I wish C# to have something like Java's 'Java Cryptography Architecture' where all crypto implementations are against a well established interface allowing one to swap out crypto libraries without impacting user code.

Back here, I think it's best we extend the .NET Core's ICryptoTransform interface as

public interface IAuthenticatedCryptoTransform : ICryptoTransform 
{
    bool CanChainBlocks { get; }
    byte[] GetTag();
    void SetExpectedTag(byte[] tag);
}

If we're Spanifying all byte[]s, that should permeate the entire API in the System.Security.Cryptography namespace for overall consistency.

Edit: Fixed JCA links

bartonjs commented 7 years ago

If we're Spanifying all byte[]s, that should permeate the entire API in the System.Security.Cryptography namespace for overall consistency.

We did that already. Everything except ICryptoTransform, because we can't change interfaces.

bartonjs commented 7 years ago

I think it's best we extend the .NET Core's ICryptoTransform ...

The problem with this is the calling pattern is very awkward with getting the tag out at the end (particularly if CryptoStream is involved). I wrote this originally, and it was ugly. There's also the problem of how to get one of these, since the GCM parameters are different than the CBC/ECB parameters.

So, here are my thoughts.

Streaming decryption is dangerous for AE.
In general, I'm a fan of "give me the primitive, and let me manage my risk"
I'm also a fan of ".NET shouldn't (easily) allow completely unsafe things, because that's some of its value proposition"
If, as I misunderstood originally, the risks of doing GCM decryption badly were input key recovery then I'd still be at "this is too unsafe". (The difference between .NET and everything else would be "having taken longer to do this the world has learned more")
But, since it isn't, if you really want the training wheels to come off, then I guess I'll entertain that notion.

My fairly raw thoughts to that end (adding to the existing suggestions, so the one-shot remains, though I guess as a virtual default impl instead of an abstract):

partial class AuthenticatedEncryptor
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryEncrypt(ReadOnlySpan<byte> data, Span<byte> encryptedData, out int bytesRead, out int bytesWritten);
    // false if remainingEncryptedData is too small, throws if other inputs are too small, see NonceOrIVSizeInBits and TagSizeInBits properties.
    // NonceOrIvUsed could move to Initialize, but then it might be interpreted as an input.
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingEncryptedData, out int bytesWritten, Span<byte> tag, Span<byte> nonceOrIvUsed);
}

partial class AuthenticatedDecryptor 
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> tag, ReadOnlySpan<byte> nonceOrIv, ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryDecrypt(ReadOnlySpan<byte> data, Span<byte> decryptedData, out int bytesRead, out int bytesWritten);
    // throws on bad tag, but might leak the data anyways.
    // (remainingDecryptedData is required for CBC+HMAC, and so may as well add remainingData, I guess?)
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingDecryptedData, out int bytesWritten);
}

AssociatedData comes at Initialize, because algorithms that need it last can hold on to it, and algorithms that need it first can’t have it any other way.

Once a shape is decided for what streaming would look like (and whether people think CCM should internally buffer, or should throw, when in streaming encryption mode) then I'll go back to the board.

SidShetye commented 7 years ago

@bartonjs I know what you mean about plucking and programming the tag from the end of the stream for symmetry across encrypt/decrypt. It’s tricky but worse if left to each user to solve. I have an implementation I can share under MIT; will need to look internally with my team (not at my desk/mobile)

A middle ground could be like OpenSSL or NT’s bcrypt where you need to plug the tag right before the final decrypt call since that’s when the tag comparisons happen. i.e. a SetExpectedTag (before final decrypt) and GetTag (after final encrypt) would work but offloads tag management to the user. Most will simply append the tag to the cipherstream since it’s the natural temporal order.

I do think expecting the tag in Initialize itself (in decrypt) breaks symmetry in space (byte flow) and time (tag check at end, not start) which limits its usefulness. But the above Tag APIs resolve that.

Also for encrypt, Initialize needs the IV before any crypto transforms.

Lastly, for encrypt and decrypt, Initialize needs the AES encryption keys before any transforms. (I’m missing something obvious or you forgot to type that bit?)

bartonjs commented 7 years ago

I do think expecting the tag in Initialize itself (in decrypt) breaks symmetry

In CBC+HMAC the usual recommendation is to verify the HMAC before starting any decryption, so it's a tag-first decryption algorithm. Similarly, there could be a "pure AE" algorithm which does destructive operations on the tag during computations and merely checks that the final answer was 0. So, like the associated data value, since there could be algorithms which need it first, it has to come first in a fully generalized API.

Floating them out into SetAssociatedData and SetTag have the problem that while the base class was algorithm-independent, the usage becomes algorithm-dependent. Changing AesGcm to AesCbcHmacSha256 or SomeTagDesctructiveAlgorithm would now result in TryDecrypt throwing because the tag was not yet provided. To me that is worse than not being polymorphic at all, so allowing the flexibility suggests breaking the model apart to be fully isolated per algorithm. (Yes, it could be controlled by more algorithm identification characteristic properties like NeedsTagFirst, but that really just leads to it being harder to use)

Also for encrypt, Initialize needs the IV before any crypto transforms.

Lastly, for encrypt and decrypt, Initialize needs the AES encryption keys before any transforms.

The key was a class ctor parameter. The IV/nonce comes from the IV/nonce provider in the ctor parameter.

The provider model solves SIV, where no IV is given during encrypt, one is generated on behalf of the data. Otherwise SIV has the parameter and requires that an empty value be provided.

or you forgot to type that bit?

The streaming methods were being added to my existing proposal, which already had the key and IV/nonce provider as ctor parameters.

SidShetye commented 7 years ago

@bartonjs : Good point that some algos could want tag first while others at the end and thanks for the reminder that it's an addition to the original spec. I found that considering a use case makes it easier, so here is a cloud-first example:

We're going to perform analytics on one or more 10GB AES-GCM encrypted files (i.e. tags after ciphertext) kept in storage. An analytics' worker concurrently decrypts multiple inbound streams into separate machine/clusters and after last byte + tag checks, starts off each analysis workload. All storage, worker, analytics VMs are in Azure US-West.

Here, there is no way to fetch the tag at the end of every stream and provide it to AuthenticatedDecryptor's Initialize method. So even if a user volunteers to modify code for GCM usage, they can't even begin to use the API.

Come to think of it, the only way we could have an API that accommodates various AEADs AND have no user code changes, is if the crypto providers for different AEAD algorithms auto-magically handle the tags. Java does this by putting the tags at the end of ciphertext for GCM and plucks it out during decrypting without user intervention. Other than that, anytime someone changes the algorithm significantly (e.g. CBC-HMAC => GCM) they will have to modify their code because of the mutually exclusive nature of tag-first and tag-last processing.

IMHO, we should first decide if

Option 1) The algorithm providers internally handle tag management (like Java)

or

Option 2) Expose enough on the API for users to do it themselves (like WinNT bcrypt or openssl)

Option 1 would really simplify the overall experience for library consumers because buffer management can get complex. Solve it well in the library and each user won't have to solve it everytime now. Plus all AEADs get the same interface (tag-first, tag-last, tag-less) and swapping out algorithms is simpler too.

My vote would be for option 1.

Finally, we were able to dig up our implementation allowing ICryptoTransform streaming operations over GCM to automatically pluck out the tag in-stream source. This was a significant update to CLR Security's own wrapper and despite the additional buffer copies it's still really fast (~4GB/sec on our test macbook pro in Windows 10 bootcamp). We basically wrapped around CLR Security to create option 1 for ourselves so we don't need to do it everywhere else. This visual really helps explain what's going on within the TransformBlock and TransformFinalBlock of the ICryptoTransform interface.

GrabYourPitchforks commented 7 years ago

@sidshetye I'm not sure why your cloud-first example is blocked. If you're reading from storage you can download the last few tag bytes first and provide that to the decryptor ctor. If using the Azure Storage APIs this would be accomplished via CloudBlockBlob.DownloadRangeXxx.

SidShetye commented 7 years ago

@GrabYourPitchforks Not to get too sidetracked on that example but that's a specific capability of Azure Blob Storage. In general, VM based storage (IaaS) or non-Azure Storage workloads typically get a network stream that's not seekable.

sdrapkin commented 7 years ago

I, personally, am very excited to see @GrabYourPitchforks - yay!

We're going to perform analytics on one or more 10GB AES-GCM encrypted files (i.e. tags after ciphertext) kept in storage. An analytics' worker concurrently decrypts multiple inbound streams into separate machine/clusters and after last byte + tag checks, starts off each analysis workload. All storage, worker, analytics VMs are in Azure US-West.

@sidshetye , you were so adamant about keeping dumb-n-dangerous primitives and smart-n-huggable protocols separate! I had a dream - and I believed it. And then you throw this at us. This is a protocol - a system design. Whoever designed that protocol you described - messed up. There is no point crying over inability to fit a square peg into a round hole now.

Whoever GCM-encrypted 10Gb files is not only living dangerously close to the primitive edge (GCM is no good after 64Gb), but there was also an implicit assertion that the whole ciphertext will need to be buffered.

Whoever GCM-encrypts 10Gb files is making a protocol mistake with overwhelming probability. The solution: chunked encryption. TLS has variable-length 16k-limited chunking, and there are other, simpler, PKI-free flavors. The "cloud-first" sex appeal of this hypothetical example does not diminish the design mistakes.

GrabYourPitchforks commented 7 years ago

(I have a lot of catching up to do on this thread.)

@sdrapkin's raised a point about reusing the IAuthenticatedEncryptor interface from the Data Protection layer. To be honest I don't think that's the right abstraction for a primitive, as the Data Protection layer is quite opinionated in how it performs cryptography. For instance, it forbids self-selection of an IV or nonce, it mandates that a conforming implementation understand the concept of AAD, and it produces a result that's somewhat proprietary. In the case of AES-GCM, the return value from IAuthenticatedEncryptor.Encrypt is the concatenation of a weird almost-nonce-thing used for subkey derivation, the ciphertext resulting from running AES-GCM over the provided plaintext (but not the AAD!), and the AES-GCM tag. So while each step involved in generating the protected payload is secure, the payload itself doesn't follow any type of accepted convention, and you're not going to find anybody aside from the Data Protection library that can successfully decrypt the resulting ciphertext. That makes it a good candidate for an app developer-facing library but a horrible candidate for an interface to be implemented by primitives.

I should also say that I don't see considerable value in having a One True Interface(tm) IAuthenticatedEncryptionAlgorithm that all authenticated encryption algorithms are supposed to implement. These primitives are "complex", unlike simple block cipher primitives or hashing primitives. There are simply too many variables in these complex primitives. Is the primitive AE only, or is it AEAD? Does the algorithm accept an IV / nonce at all? (I've seen some that don't.) Are there any concerns with how the input IV / nonce or data must be structured? IMO the complex primitives should simply be standalone APIs, and higher-level libraries would bake in support for the specific complex primitives they care about. Then the higher-level library exposes whatever uniform API it believes is appropriate for its scenarios.

SidShetye commented 7 years ago

@sdrapkin We're going off topic again. I'll just say that a system is built using primitives. The crypto primitives here are bare and powerful. While the system/protocol layer handled buffering; that too at a cluster level, certainly not in main system memory that the one shot primitives would force. 'chunking' boundary is X (X=10GB here) because < 64GB, because buffering capacity of the cluster was nearly limitless and nothing would/could start until last byte is loaded in the cluster. This is exactly the separation of concerns, optimizing each layer for it's strengths that I've been talking about. And this can only happen if the underlying primitives don't handicap higher layer designs/limitations (note that more real world apps come with their own legacy handicaps).

sdrapkin commented 7 years ago

NIST 800-38d sec9.1 states:

In order to inhibit an unauthorized party from controlling or influencing the generation of IVs, GCM shall be implemented only within a cryptographic module that meets the requirements of FIPS Pub. 140-2. In particular, the cryptographic boundary of the module shall contain a “generation unit” that produces IVs according to one of the constructions in Sec. 8.2 above. The documentation of the module for its validation against the requirements of FIPS 140-2 shall describe how the module complies with the uniqueness requirement on IVs.

That implies to me that GCM IVs must be auto-generated internally (and not passed in externally).

SidShetye commented 7 years ago

@sdrapkin Good point but if you read even closer you'll see that for IV lengths of 96 bits and above, section 8.2.2 allows for generating an IV with a random bit generator (RBG) where at least 96 bits are random (you could just 0 other bits). I did mention this last month on this thread itself (here under nonce).

LT;DR: INonce is a trap leading to non-compliance with NIST and FIPS guidelines.

Section 9.1 simply says, for FIPS 140-2, the IV generation unit (fully random i.e. sec 8.2.2 or deterministic implementation i.e. sec 8.2.1) must lie within the module boundary undergoing FIPS validated. Since ...

RBGs are already FIPS validated
IV lens >= 96 is recommended
designing an IV generation unit that persist reboots, indefinite loss of power into a crypto primitive layer is hard
getting 3 above implemented within the crypto library AND getting it certified is hard and expensive ($50K for anything resulting in a non bit-exact build image)
No user code will ever implement 3 and get it certified because of 4 above. (lets leave aside some exotic military/govt installations).

... most crypto libraries (see Oracle's Java, WinNT's bcryptprimitives, OpenSSL etc) undergoing FIPS certification use the RBG route for IV and simply take a byte array for input. Note that having the INonce interface is actually a trap from NIST and FIPS' perspective because it implicitly suggests that a user should pass an implementation of that interface to the crypto function. But any user implementation of INonce is almost guaranteed to have NOT undergone the 9 month + and $50K+ NIST certification process. Yet, if they had just sent a byte array using the RGB construct (already in the crypto library), they would be fully compliant with the guidelines.

I've said before - these existing crypto libraries have evolved their API surface and have been battle tested across multiple scenarios. More that what we've touched upon in this long thread. My vote again is to leverage that knowledge and experience across all those libraries, all those validations and all those installations rather than attempting to reinvent the wheel. Don't reinvent the wheel. Use it to invent the rocket :)

SidShetye commented 7 years ago

Hi folks,

Any updates on this? Haven't seen any updates at @karelz 's crypto roadmap thread or on the AES GCM thread.

Thanks Sid

Timovzl commented 6 years ago

So the last concrete proposal is from https://github.com/dotnet/corefx/issues/23629#issuecomment-334328439:

partial class AuthenticatedEncryptor
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryEncrypt(ReadOnlySpan<byte> data, Span<byte> encryptedData, out int bytesRead, out int bytesWritten);
    // false if remainingEncryptedData is too small, throws if other inputs are too small, see NonceOrIVSizeInBits and TagSizeInBits properties.
    // NonceOrIvUsed could move to Initialize, but then it might be interpreted as an input.
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingEncryptedData, out int bytesWritten, Span<byte> tag, Span<byte> nonceOrIvUsed);
}

partial class AuthenticatedDecryptor 
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> tag, ReadOnlySpan<byte> nonceOrIv, ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryDecrypt(ReadOnlySpan<byte> data, Span<byte> decryptedData, out int bytesRead, out int bytesWritten);
    // throws on bad tag, but might leak the data anyways.
    // (remainingDecryptedData is required for CBC+HMAC, and so may as well add remainingData, I guess?)
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingDecryptedData, out int bytesWritten);
}

Only a few potential issues have been raised since:

The tag is required upfront, which hinders certain scenarios. Either the API must become significantly more complex to allow further flexibility, or this issue must be considered a protocol (i.e. high-level) problem.
INonceProvider might be needlessly complex and/or lead to non-compliance with NIST and FIPS guidelines.
The intended abstraction of authenticated encryption primitives might be a pipe dream, as differences might be too great. There has not been any further discussion of this suggestion.

I'd like to suggest the following:

The additional complexity of not requiring the tag upfront seems severe, the corresponding problem scenario seems uncommon, and the problem does indeed sound very much like a matter of protocol. Good design can accommodate much, but not everything. Personally I feel comfortable leaving this to the protocol. (Strong counterexamples welcome.)
The discussion has consistently moved towards a flexible, low-level implementation that does not protect against misuse, with the exception of IV generation. Let's be consistent. The general consensus seems to be that a high-level API is an important next step, vital for proper use by the majority of developers - this is how we get away with not protecting against misuse in the low-level API. But it seems that an extra dose of fear has sustained the idea of misuse prevention in the area of IV generation. In the context of a low-level API, and to be consistent, I'd lean towards a byte[]-equivalent. But implementation swapping is more seamless with the injected INonceProvider. Is @sidshetye's comment irrefutable, or could a simple INonceProvider implentation that merely calls the RNG still be considered compliant?
The abstractions seem useful, and so much effort has been put into designing them, that by now I am convinced they will do more good than harm. Besides, high-level APIs can still choose to implement low-level APIs that do not conform to the low-level abstractions.
IV is the general term, and a nonce is a specific kind of IV, correct? This begs for renames from INonceProvider to IIVProvider, and from nonceOrIv* to iv*. After all, we are always dealing with an IV, but not necessarily with a nonce.

Drawaes commented 6 years ago

The tag upfront is a non starter for my scenario so I will probably just keep my own inplementation. Which is fine I am not sure it's everyone's cup of tea to write high perf code in this area.

The problem is it will cause unneeded latency. You have to pre buffer an entire message to get the tag at the end to start decoding the frame. This means you basically can't overlap IO and decrypting.

I am not sure why it's so hard to allow it at the end. But I am not going to out a road block for this API it just won't be of any interest in my scenario.

bartonjs commented 6 years ago

IV is the general term, and a nonce is a specific kind of IV, correct?

No. A nonce is a number used once. An algorithm which specifies a nonce indicates that reuse violates the guarantees of the algorithm. In the case of GCM, using the same nonce with the same key and a different message can result in the compromise of the GHASH key, reducing GCM to CTR.

From http://nvlpubs.nist.gov/nistpubs/ir/2013/NIST.IR.7298r2.pdf:

Nonce: A value used in security protocols that is never repeated with the same key. For example, nonces used as challenges in challenge-response authentication protocols generally must not be repeated until authentication keys are changed. Otherwise, there is a possibility of a replay attack. Using a nonce as a challenge is a different requirement than a random challenge, because a nonce is not necessarily unpredictable.

An "IV" doesn't have the same stringent requirements. For example, repeating an IV with CBC only leaks whether the encrypted message is the same as, or different than, than a previous one with the same IV. It does not weaken the algorithm.

Timovzl commented 6 years ago

A nonce is a number used once. An "IV" doesn't have the same stringent requirements.

@bartonjs Yes. I would reason that, since a nonce is used to initialize the crypto primitive, it is its initialization vector. It adheres perfectly to any definition of IV that I can find. It has more stringent requirements, yes, just as being a cow has more stringent requirements than being an animal. The current wording seems to ask for a "cowOrAnimal" parameter. The fact that different modes have varying requirements of the IV does not change the fact that they are all asking for some form of IV. If there's something I'm missing, by all means keep the current wording, but as far as I can tell, just "iv" or "IIVProvider" are both simple and correct.

dotnet / runtime

General low level primitive for ciphers (AES-GCM being the first) #23365

Rationale

Proposed API

Example Usage

API Behaviour

Updates

Regarding the deliberation process

Regarding streaming

1. the 'streaming implies no AES-GCM security' argument

2. Not specific to GCM!

3. It's the best we have

4. Use AES-CBC-HMAC, Use (insert workaround)

Nonce

I'd love to be proven wrong

Speaking of community dialog ...

Option 1) The algorithm providers internally handle tag management (like Java)

Option 2) Expose enough on the API for users to do it themselves (like WinNT bcrypt or openssl)