ajgb / crypt-nacl-sodium

Crypt::NaCl::Sodium - NaCl compatible modern, easy-to-use library for encryption, decryption, signatures, password hashing and more
https://metacpan.org/release/Crypt-NaCl-Sodium
9 stars 10 forks source link

leaving copy of a secret data in RAM after $crypto_sign->mac() #6

Closed ikruglov closed 7 years ago

ikruglov commented 7 years ago

Hi,

I have a question about internal workflows of Crypt::NaCl::Sodium::sign and Data::BytesLocker

Consider following code:

#!/bin/env perl

use strict;
use warnings;
use Crypt::NaCl::Sodium qw/:utils/;

my $secret = hex2bin('a25b84b660c62019b7db57fc612e52a0accf99bd17490f4d65d3c118c09970e90a20c8312f2bbeb4a377a151e25fff1ec7be049526716a64b771e26e44a56bcc');
my $bl = Data::BytesLocker->new($secret);

my $crypto_sign = Crypt::NaCl::Sodium->sign();
my $mac = $crypto_sign->mac('message to sign', $bl);
printf "mac: %s\n", $mac->to_hex;

base on my evaluation of $crypto_sign->mac() and Data::BytesLocker's stringification method I conclude that the workflow leave copy of secret in RAM as a side effect of computing mac. This is because mac() calls SvPV(seckey, skey_len) which calls stringification method which in return does newSVpvn() on secret data.

newSVpvn:

Creates a new SV and copies a string into it, which may contain NUL characters (\0 ) and other binary data. 

I double checked this workflow by amending above code to:

#!/bin/env perl

use strict;
use warnings;
use Crypt::NaCl::Sodium qw/:utils/;

my $secret = hex2bin('a25b84b660c62019b7db57fc612e52a0accf99bd17490f4d65d3c118c09970e90a20c8312f2bbeb4a377a151e25fff1ec7be049526716a64b771e26e44a56bcc');
my $bl = Data::BytesLocker->new($secret);
bless $bl, 'Data::BytesLocker::NoOverloading';

my $crypto_sign = Crypt::NaCl::Sodium->sign();
my $mac = $crypto_sign->mac('message to sign', $bl);
printf "mac: %s\n", $mac->to_hex;
exit 0;

package Data::BytesLocker::NoOverloading;
use parent '-norequire', 'Data::BytesLocker';
use overload '""' => sub { warn "stringification access"; my $coderef = Data::BytesLocker->can('(""'); $coderef->(@_) }, 'fallback' => 1;
1;

this outputs:

stringification access at test.pl line 18.
mac: b412a4b610f4fb1e855153ab4bb4d8a3d4ec2efadd10c4e4b670f82a01e3f979762d00626928b2415257990db091c01419114c64db2e55ee6ae22a25186e3003

Does this sound right?

If so I would expect $crypto_sign->mac() to access underlying mprotected Data::BytesLocker's buffer directly and not silently copying protected data to unprotected memory region.

ajgb commented 7 years ago

Hi Ivan,

You are right, the arguments that contain sensitive data are stringified upon use, but that should only occur when transfering data from C to Perl.

I've got already an upgrade to the latest version of libsodium pending and I'll include a change that would not copy data from BytesLocker buffer into Perl's SV.

Once it's done I'll update this issue to let you know.

Thanks, Alex

ikruglov commented 7 years ago

Hi Alex,

Thanks for you reply. Do you think that such transferring from "C to Perl" should even happen? My understanding of Data::BytesLocker is that it is for keeping sensitive data and this data should not leave protected area unless used for loading/unloading to persisted storage.

Let me elaborate a bit. I actually have another proposal for Data::BytesLocker which I would like to discuss and which is connected to this topic.

IMO Data::BytesLocker should not have "stringification" and "concatenation" accessors. The reason for this is such accessors provides easy ways to accidentally leak secret data in tracing, debugging, etc information which is very much possible in large codebases maintained by multiple devs. For instance it just take print "$data_bytes_locker" to leak secret in logs.

To prevent such case here is what I planned to used in my codebase:

# the intent of this package is to prevent accidental leakage in debuging,
# tracing, events and other types of information which is all around.
package Data::BytesLocker::NoOverloading;
use parent '-norequire', 'Data::BytesLocker';
use overload '""' => sub { 'SECRET DATA STRINGIFIED' },
             '.'  => sub { 'SECRET DATA CONCATENATED' },
             'x'  => sub { 'SECRET DATA REPETEAD' },
             'fallback' => 1;
sub new { my $class = shift; bless Data::BytesLocker->new(@_), $class; }
sub new_from_bytes_locker { bless $_[1], $_[0] }

This code re-overloads all methods which could lead to accidental leakage and return bogus data instead of actual secret. Please note that the key concept behind this idea is accidental, not intended leakage. If one wants to intentionally access secret data s/he can always use bytes() and to_hex() methods [1].

So, to summarize, I would like to propose: a) removing "stringification", "concatenation", "repetition" accessors b) all Crypt::NaCl::Sodium functions which need accessing secret to sign/mac/encrypt should access Data::BytesLocker buffer directly in C land

This changes may and likely will break compatibility with existing code. We can minimize them by, for instance, returning plain SV instead of Data::BytesLocker object in places where this is not strictly needs. For example result of $crypto_sign->mac() or $crypto_secretbox->encrypt() is not a sensitive information (correct me if I'm wrong). So, IMO, plain SV just works.

However, I bet that not all cases can be addressed in compatible mode. Perhaps new version of the package could be released.

Overall, what I would like to see out of Crypt::NaCl::Sodium is easy to use encryption library (which it's already is) but also a library which is hard to misuse in not secured way.

Please let me know what you think? If you're agree I'm willing to implement these changes in series of pull requests.

[1] which, as you mentioned, should not copy data but rather, IMO, provide temporary pointer to protected area.

ikruglov commented 7 years ago

@ajgb what do you think?

ikruglov commented 7 years ago

I prepared two commits demonstrating proposed changes: https://github.com/ikruglov/crypt-nacl-sodium/commit/ab2ad236e71ac0bd01464fc4162b6003997b1e5e https://github.com/ikruglov/crypt-nacl-sodium/commit/643e5e8d372d015d12652c00bd1d7ea101e4be19

ajgb commented 7 years ago

Hi Ivan,

I'm not so keen about removing the stringification (with friends) overloaded methods.

By providing those methods users can pass the BytesLocker objects around and unlock them only when needed, eg. when encrypting/decrypting information or when interfacing other parts of their system that expect to work on strings.

Also it looks like the other change would now return plain SV (which for some may be fine), but would break the rule that most (if not all) methods return BytesLocker objects.

The BytesLocker package already provides optional configuration that creates all its objects in locked state by default: $Data::BytesLocker::DEFAULT_LOCKED = 1;.

So combined with the change to avoid creation of standard PVs and having all objects locked by default the chances of unintentional leaks of the data are fairly small. Although still possible if you work with some asynchronous code where one of the threads constantly tries to access the locked objects - could you provide some other examples?

Thanks, Alex

ikruglov commented 7 years ago

Hi Alex,

I see your point in keeping BytesLocker locked most of the time. This solves some of the issues.

but would break the rule that most (if not all) methods return BytesLocker objects.

may I ask why all methods should return BytesLocker? What's the idea behind it?

My assumption is that BytesLocker is for protecting sensitive data i.e. keys, secrets and seeds. Result of mac(), encrypt() and other non-secret-managment functions is not a sensitive information. It should be fine to pass encrypted value through public media. Perhaps only decrypt() falls into this category of functions returning sensitive information.

Please correct me if I'm wrong. Keen to hear from you soon :-).

ajgb commented 7 years ago

Hi Ivan,

While mac() or nonce() methods in general do not produce sensitive data, they do however return binary data - sending binary data to the other party is most of the time difficult, you'd be most likely encoding it in some way, eg. using hex encoding.

So by returning BytesLocker objects that provide a number of helper functions (to_hex() being one of them) the end user of this library has one thing less to worry about :)

Nonces have the increment() method which would help to ensure that they are not reused etc.

So by locking everything (which is optional and rather for advanced users) you are in control when the data can be accessed and minimise the time frame in which the data is not protected anymore.

Having said that I'd not offer any guarantees that your application would be then secure from an attack by somebody capable of exploiting Perl implementation details.

Thanks, Alex

ikruglov commented 7 years ago

Thanks for your reply.

Let me put into perspective following information:

$ time perl -I blib/lib/ -I blib/arch/ -MTime::HiRes=time -MCrypt::NaCl::Sodium=:utils -wle 'my $crypto_auth = Crypt::NaCl::Sodium->auth(); my $key = $crypto_auth->keygen(); my @a; my
$t0 = time; push @a, $crypto_auth->mac("test", $key) for 1..1_000_000; printf "elapsed: %.3fs\n", time - $t0'

# output from current implementation
elapsed: 19.586s
real    0m29.404s
user    0m13.928s
sys     0m14.868s

# output after my patches
elapsed: 6.974s
real    0m7.111s
user    0m7.012s
sys     0m0.067s

as you can see usage of BytesLocker incurs significant performance overhead which, in case of mac()/encrypt()/sign()/etc, come for no good reason.

I mean, I see you point about convenient ->hex() and ->increment() methods. However, all of this can be addressed in different ways. I see following options: 1) returning SV + bin2hex()/nonce_increment()/etc functions (preferred IMO) 2) returning another blessed object with ->hex()/->increment()/etc methods (a bit more expensive but still acceptable)

So, what I'm trying to say is that it seems that current implementation of Crypt::NaCl::Sodium tries to use Data:BytesLocker for several different purposes which leads to inefficiency and reduced security. Namely: a) BytesLocker is secured storage for keys/secrets/etc b) BytesLocker is convenient tool to access data in stringified form c) BytesLocker is convenient tool to manipulate nonce d) maybe more

What I think BytesLocker should be doing is only (a). All others use cases can be solved by other less expensive means.

If we reserve this single use case for BytesLocker we automatically don't need "stringification" and friends accessors. Only limited set of functions actually needs to have such access which can and should happen in C land for security reasons IMO.

IMO this way we can achieve nice combination of set of secured workflows which are easy to use and hard to misuses and which are fast.

Please let me know what you think.

ikruglov commented 7 years ago

btw, note difference between elapsed and real timings in given output in current implementation in my previous post due to secured memory deallocation.

ajgb commented 7 years ago

Hi Ivan,

Benchmarking is hard, focusing on a single part of the application is never a good idea, unless of course your application is meant to calculate checksums on short strings...

I understand that you wish to speed up certain parts of the library and you are willing to sacrifice the usability.

However I disagree with this approach for the reasons already stated.

Thanks, Alex