Closed ikruglov closed 7 years ago
Hi Ivan,
You are right, the arguments that contain sensitive data are stringified upon use, but that should only occur when transfering data from C to Perl.
I've got already an upgrade to the latest version of libsodium pending and I'll include a change that would not copy data from BytesLocker buffer into Perl's SV.
Once it's done I'll update this issue to let you know.
Thanks, Alex
Hi Alex,
Thanks for you reply. Do you think that such transferring from "C to Perl" should even happen? My understanding of Data::BytesLocker is that it is for keeping sensitive data and this data should not leave protected area unless used for loading/unloading to persisted storage.
Let me elaborate a bit. I actually have another proposal for Data::BytesLocker which I would like to discuss and which is connected to this topic.
IMO Data::BytesLocker should not have "stringification" and "concatenation" accessors. The reason for this is such accessors provides easy ways to accidentally leak secret data in tracing, debugging, etc information which is very much possible in large codebases maintained by multiple devs. For instance it just take print "$data_bytes_locker" to leak secret in logs.
To prevent such case here is what I planned to used in my codebase:
# the intent of this package is to prevent accidental leakage in debuging,
# tracing, events and other types of information which is all around.
package Data::BytesLocker::NoOverloading;
use parent '-norequire', 'Data::BytesLocker';
use overload '""' => sub { 'SECRET DATA STRINGIFIED' },
'.' => sub { 'SECRET DATA CONCATENATED' },
'x' => sub { 'SECRET DATA REPETEAD' },
'fallback' => 1;
sub new { my $class = shift; bless Data::BytesLocker->new(@_), $class; }
sub new_from_bytes_locker { bless $_[1], $_[0] }
This code re-overloads all methods which could lead to accidental leakage and return bogus data instead of actual secret. Please note that the key concept behind this idea is accidental, not intended leakage. If one wants to intentionally access secret data s/he can always use bytes() and to_hex() methods [1].
So, to summarize, I would like to propose: a) removing "stringification", "concatenation", "repetition" accessors b) all Crypt::NaCl::Sodium functions which need accessing secret to sign/mac/encrypt should access Data::BytesLocker buffer directly in C land
This changes may and likely will break compatibility with existing code. We can minimize them by, for instance, returning plain SV instead of Data::BytesLocker object in places where this is not strictly needs. For example result of $crypto_sign->mac() or $crypto_secretbox->encrypt() is not a sensitive information (correct me if I'm wrong). So, IMO, plain SV just works.
However, I bet that not all cases can be addressed in compatible mode. Perhaps new version of the package could be released.
Overall, what I would like to see out of Crypt::NaCl::Sodium is easy to use encryption library (which it's already is) but also a library which is hard to misuse in not secured way.
Please let me know what you think? If you're agree I'm willing to implement these changes in series of pull requests.
[1] which, as you mentioned, should not copy data but rather, IMO, provide temporary pointer to protected area.
@ajgb what do you think?
I prepared two commits demonstrating proposed changes: https://github.com/ikruglov/crypt-nacl-sodium/commit/ab2ad236e71ac0bd01464fc4162b6003997b1e5e https://github.com/ikruglov/crypt-nacl-sodium/commit/643e5e8d372d015d12652c00bd1d7ea101e4be19
Hi Ivan,
I'm not so keen about removing the stringification (with friends) overloaded methods.
By providing those methods users can pass the BytesLocker objects around and unlock them only when needed, eg. when encrypting/decrypting information or when interfacing other parts of their system that expect to work on strings.
Also it looks like the other change would now return plain SV (which for some may be fine), but would break the rule that most (if not all) methods return BytesLocker objects.
The BytesLocker package already provides optional configuration that creates all its objects in locked state by default: $Data::BytesLocker::DEFAULT_LOCKED = 1;
.
So combined with the change to avoid creation of standard PVs and having all objects locked by default the chances of unintentional leaks of the data are fairly small. Although still possible if you work with some asynchronous code where one of the threads constantly tries to access the locked objects - could you provide some other examples?
Thanks, Alex
Hi Alex,
I see your point in keeping BytesLocker locked most of the time. This solves some of the issues.
but would break the rule that most (if not all) methods return BytesLocker objects.
may I ask why all methods should return BytesLocker? What's the idea behind it?
My assumption is that BytesLocker is for protecting sensitive data i.e. keys, secrets and seeds. Result of mac(), encrypt() and other non-secret-managment functions is not a sensitive information. It should be fine to pass encrypted value through public media. Perhaps only decrypt() falls into this category of functions returning sensitive information.
Please correct me if I'm wrong. Keen to hear from you soon :-).
Hi Ivan,
While mac()
or nonce()
methods in general do not produce sensitive data, they do however return binary data - sending binary data to the other party is most of the time difficult, you'd be most likely encoding it in some way, eg. using hex encoding.
So by returning BytesLocker objects that provide a number of helper functions (to_hex()
being one of them) the end user of this library has one thing less to worry about :)
Nonces have the increment()
method which would help to ensure that they are not reused etc.
So by locking everything (which is optional and rather for advanced users) you are in control when the data can be accessed and minimise the time frame in which the data is not protected anymore.
Having said that I'd not offer any guarantees that your application would be then secure from an attack by somebody capable of exploiting Perl implementation details.
Thanks, Alex
Thanks for your reply.
Let me put into perspective following information:
$ time perl -I blib/lib/ -I blib/arch/ -MTime::HiRes=time -MCrypt::NaCl::Sodium=:utils -wle 'my $crypto_auth = Crypt::NaCl::Sodium->auth(); my $key = $crypto_auth->keygen(); my @a; my
$t0 = time; push @a, $crypto_auth->mac("test", $key) for 1..1_000_000; printf "elapsed: %.3fs\n", time - $t0'
# output from current implementation
elapsed: 19.586s
real 0m29.404s
user 0m13.928s
sys 0m14.868s
# output after my patches
elapsed: 6.974s
real 0m7.111s
user 0m7.012s
sys 0m0.067s
as you can see usage of BytesLocker incurs significant performance overhead which, in case of mac()/encrypt()/sign()/etc, come for no good reason.
I mean, I see you point about convenient ->hex() and ->increment() methods. However, all of this can be addressed in different ways. I see following options: 1) returning SV + bin2hex()/nonce_increment()/etc functions (preferred IMO) 2) returning another blessed object with ->hex()/->increment()/etc methods (a bit more expensive but still acceptable)
So, what I'm trying to say is that it seems that current implementation of Crypt::NaCl::Sodium tries to use Data:BytesLocker for several different purposes which leads to inefficiency and reduced security. Namely: a) BytesLocker is secured storage for keys/secrets/etc b) BytesLocker is convenient tool to access data in stringified form c) BytesLocker is convenient tool to manipulate nonce d) maybe more
What I think BytesLocker should be doing is only (a). All others use cases can be solved by other less expensive means.
If we reserve this single use case for BytesLocker we automatically don't need "stringification" and friends accessors. Only limited set of functions actually needs to have such access which can and should happen in C land for security reasons IMO.
IMO this way we can achieve nice combination of set of secured workflows which are easy to use and hard to misuses and which are fast.
Please let me know what you think.
btw, note difference between elapsed and real timings in given output in current implementation in my previous post due to secured memory deallocation.
Hi Ivan,
Benchmarking is hard, focusing on a single part of the application is never a good idea, unless of course your application is meant to calculate checksums on short strings...
I understand that you wish to speed up certain parts of the library and you are willing to sacrifice the usability.
However I disagree with this approach for the reasons already stated.
Thanks, Alex
Hi,
I have a question about internal workflows of Crypt::NaCl::Sodium::sign and Data::BytesLocker
Consider following code:
base on my evaluation of $crypto_sign->mac() and Data::BytesLocker's stringification method I conclude that the workflow leave copy of secret in RAM as a side effect of computing mac. This is because mac() calls SvPV(seckey, skey_len) which calls stringification method which in return does newSVpvn() on secret data.
newSVpvn:
I double checked this workflow by amending above code to:
this outputs:
Does this sound right?
If so I would expect $crypto_sign->mac() to access underlying mprotected Data::BytesLocker's buffer directly and not silently copying protected data to unprotected memory region.