digitalbazaar / forge

A native implementation of TLS in Javascript and tools to write crypto-based and network-heavy webapps
https://digitalbazaar.com/
Other
5.07k stars 784 forks source link

Working with large buffer data (TypedArray/node Buffer support) #60

Open dcodeIO opened 11 years ago

dcodeIO commented 11 years ago

As of today forge seems pretty complete to me. However, when working with large data, the internal byte buffer on top of UTF-16 strings seems to be suboptimal. Wouldn't it be better to do it with typed arrays? Maybe also with a wrapper like https://github.com/dcodeIO/ByteBuffer.js if automatic resizing is an issue.

Ayms commented 11 years ago

It's not very easy to switch to typed arrays given forge structure, but since I did it I am using typed arrays now with forge, see example here : https://github.com/Ayms/abstract-tls/#buffers-

juliangruber commented 11 years ago

https://github.com/chrisdickinson/bops could be helpful as it allows you to write code that uses typed arrays in browsers and buffers in node.

dlongley commented 11 years ago

Forge was initially written before TypedArrays were prevalent. We definitely want to switch to them (and to buffers in node as @juliangruber suggested), I just haven't had the time to implement a plan to switch over. The plan is to include switching over as part (perhaps in steps) of implementing the WebCrypto API so that forge can be used as a polyfill. We want to clean up and consolidate the test suite (and use mocha) before this work happens though, to ensure everything keeps running properly.

Ayms commented 11 years ago

@juliangruber @dlongley see https://github.com/Ayms/abstract-tls/blob/master/lib/abstract-tls.js#l125-657 or https://github.com/Ayms/node-typedarray , I should maybe one day update it and make a separate repo, you can use typed arrays in browsers and in node (or other platforms), mix both in node or replace node's buffers by typed arrays, so you have a standard implementation that does not care about the specific of the platforms' buffers. TextEncoder/Decoder have a useful streaming option.

Implementing WebCrypto API when available would allow to extend forge capabilities a lot and reduce the code a lot too, forge as a polyfill could become something like a substitute/complement to WebCrypto WG secondary features (http://www.w3.org/2011/11/webcryptography-charter.html#scope) that seem not to get the interest of everybody.

dcodeIO commented 11 years ago

On the node side I experienced that using node's crypto module's ciphers with rsa from https://npmjs.org/package/ursa is much more performant because both are libssl bindings. Imho forge's primary focus should be on the browser, also using it as a polyfill is a great idea.

dlongley commented 11 years ago

@dcodeIO, yes forge's primary focus is the browser. I do recommend using ursa if you're using node.js (we use it as well), but it doesn't necessarily have all of the features you may need. Forge has some useful stuff that isn't available via any natively-complied npm modules (that I'm aware of) but someone could always come along and create more openSSL wrappers to remedy that.

jduncanator commented 10 years ago

Can we bump this please? Its a pain having to decrypt and encrypt images by turning the result into a base64 string and then using that as the source of an image tag. If you had a toArrayBuffer function on Forge Buffer then you could really easily create Blob URIs without the nasty overhead of converting to base64...

jduncanator commented 10 years ago

See #146 :smiley:

Ayms commented 10 years ago

As mentioned above, I wrote https://github.com/Ayms/abstract-tls/blob/master/lib/abstract-tls.js#l125-657 some time ago to turn forge's buffers into typed arrays.

The problem is that it was not really performing better than forge's buffers, so I reverted to forge's buffers.

Because to use correctly typed arrays all the forge modules should be rewritten to eliminate the binary manipulations (is this the goal of #146 ?)

@jduncanator I don't know how you proceed but you can just read you image as a Blob, pass it to a worker, read it as an Arraybuffer slicing it and encrypt/decrypt the chunks, return the partial results as ArrayBuffer or Blob or the complete one.

That's what I am doing to encrypt/decrypt or hash Blobs in http://www.peersm.com

If I remember correctly, to do this you need to modify forge crypto so update does process all the bytes (which it should be doing) and remove final which becomes useless.

Ayms commented 10 years ago

And of course make a small modification to convert the chunks into binary when passed to crypto.

jduncanator commented 10 years ago

@Ayms I'm currently working on getting all the kinks out of the Typed Array backed Forge buffer so it should remove the need for any of that. Would also make life easier when implementing the HTML5 Crypto spec.

Ayms commented 10 years ago

OK, but as I wrote you should then rewrite all forge (except for modules that you could replace using WebCrypto API, which for sure will not include SSL/TLS, certificates management, etc), is this your plan?

jduncanator commented 10 years ago

@Ayms Why rewrite it all? The idea, if you see #146, is to implement a transparent interface so that the rest of the Forge API doesn't know the difference. That basically removes the need to rewrite it all.

Regarding your idea, I currently have a really old (well not that old) version of Forge that I hacked ECB mode for AES into. It looks like it would be a single line change in the latest API but I also cut out a bunch of modules I don't need and I'd rather not go through the pain of that again!

Ayms commented 10 years ago

Unless I am misunderstanding #146, that's exactly what I did with the link provided above some time ago (did you look at it?): turn the forge buffers into a typed array interface which can be used by forge not modifying it.

The problem is that all forge modules are manipulating data with the binary format, so at a certain point of time your typed array interface must convert to binary.

Even after some (a lot of) optimization efforts, I reached the conclusion that forge buffers were still more performant.

So, if you don't modify the underlying structure, there is no point of introducing typed arrays, I am afraid you will waste your time and the result will probably be less efficient.

The solution for you would be the one I have proposed, it's fast and working well.

It's not proven that a full typed array implementation of forge would be more efficient, even if I think it should move toward typed arrays, for example there is to my knowledge no improved version of sjcl with typed arrays.

jduncanator commented 10 years ago

@Ayms I'm not sure where you are getting confused, none of the algorithms in forge directly access the "string" data, they all use read* and write* methods. On top of that I have no idea where you got your performance numbers from because TypedArrays are mapped directly onto a chunk of memory by the browser, strings have to be marshalled between v8 objects to C strings and back again. Remember, strings are immutable so whenever you modify a string you are infact copying it!

Ayms commented 10 years ago

No, the code is manipulating strings, so if you want to switch efficiently to typed arrays you have to rewrite it, and for the performances, believe me, I spent quite a lot of time on this.

To give a sample, and update the repo version, the latest version of my typed array implementation is (I can provide the complete version with Buffer and UInt8Array prototype additions if necessary, the idea was for it to be compatible with node too):

forge.util.ByteBuffer = function() {};

forge.util.createBuffer = function(input, encoding) {
    var a=new forge.util.ByteBuffer();
    if (input) {
        a.data=new Buffer(input,encoding||'binary');
        a.length_=a.data.length;
    } else {
        a.data=new Buffer(buffer_size);
        a.length_=0;
    };
    a.read=0;
    return a;
};

forge.util.ByteBuffer.prototype.length = function() {
    return this.length_-this.read;
};

forge.util.ByteBuffer.prototype.isEmpty = function() {
    return (this.length_-this.read)===0;
};

forge.util.ByteBuffer.prototype.putByte = function(b) { //b charcode
    if (this.data.length>=this.length_+1) {
        this.data.writeUInt(b,this.length_,1);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),new Uint8Array([b])].concatBuffers()):(new Uint8Array([b]));
    };
    this.length_ +=1;
};

forge.util.ByteBuffer.prototype.getByte = function() { //return charcode
    return this.data[this.read++];
};

forge.util.ByteBuffer.prototype.at=function(i) {
    return this.data[this.read+i];
};

forge.util.ByteBuffer.prototype.last = function() {
    return this.data[this.length_-1];
};

forge.util.ByteBuffer.prototype.fillWithByte=function(b,n) {//b charcode
    if (this.data.length>=this.length_+n) {
        var o=this.length_;
        for (var i=0;i<n;i++) {
            this.data[o+i]=b;
        };
    } else {
        var arr=[];
        for (var i=0;i<n;i++) {
            arr.push(b);
        };
        this.data=this.length_?([this.data.slice(0,this.length_),new Uint8Array(arr)].concatBuffers()):(new Uint8Array(arr));
    };
    this.length_ +=n;
};

forge.util.ByteBuffer.prototype.putBytes = function(bytes) { //bytes string or Buffer (from process)
    var a;
    if (typeof(bytes)==='string') {
        a=new Buffer(bytes,'binary');
    } else {
        a=bytes;
    };
    var l=a.length;
    if (this.data.length>=this.length_+l) {
        this.data.set(a,this.length_);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),a].concatBuffers()):a;
    };
    this.length_+=l;
};

forge.util.ByteBuffer.prototype.getBytes = function(count) { //return string
    var rval;
    if(count) {
        count=Math.min(this.length(),count);
        rval=this.data.slice(this.read,this.read+count).toString('binary');
        this.read +=count;
    } else if(count===0) {
        rval = '';
    } else {
        rval=this.data.slice(this.read,this.length_).toString('binary');
        this.clear();
    };
    return rval;
};

forge.util.ByteBuffer.prototype.putBuffer = function(buffer) {
    if (this.data.length>=this.length_+buffer.length_) {
        if (buffer.length_) {
            this.data.set(buffer.data.slice(0,buffer.length_),this.length_);
        };
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),buffer.data.slice(0,buffer.length_)].concatBuffers()):(buffer.data.slice(0,buffer.length_));
    };
    this.length_+=buffer.length_;
    buffer.clear();
};

forge.util.ByteBuffer.prototype.bytes = function(count) {
    if (!count) {
        return (this.data.slice(this.read,this.length_)).toString('binary');
    } else {
        return this.data.slice(this.read,this.read+count).toString('binary');
    };
};

forge.util.ByteBuffer.prototype.putInt16 = function(i) {
    if (this.data.length>=this.length_+2) {
        this.data.writeUInt(i,this.length_,2);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),(new Buffer(2)).writeUInt(i)].concatBuffers()):((new Buffer(2)).writeUInt(i));
    };
    this.length_ +=2;
};

forge.util.ByteBuffer.prototype.putInt24 = function(i) {
    if (this.data.length>=this.length_+3) {
        this.data.writeUInt(i,this.length_,3);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),(new Buffer(3)).writeUInt(i)].concatBuffers()):((new Buffer(3)).writeUInt(i));
    };
    this.length_ +=3;
};

forge.util.ByteBuffer.prototype.putInt32 = function(i) {
    if (this.data.length>=this.length_+4) {
        this.data.writeUInt(i,this.length_,4);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),(new Buffer(4)).writeUInt(i)].concatBuffers()):((new Buffer(4)).writeUInt(i));
    };
    this.length_ +=4;
};

forge.util.ByteBuffer.prototype.putInt32Le = function(i) {
    if (this.data.length>=this.length_+4) {
        this.data.writeUIntLE(i,this.length_,4);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),(new Buffer(4)).writeUIntLE(i)].concatBuffers()):((new Buffer(4)).writeUIntLE(i));
    };
    this.length_ +=4;
};

forge.util.ByteBuffer.prototype.putInt = function(i,n) {
    n=n/8;
    if (this.data.length>=this.length_+n) {
        this.data.writeUInt(i,this.length_,n);
    } else {
        this.data=this.length_?([this.data.slice(0,this.length_),(new Buffer(n)).writeUInt(i)].concatBuffers()):((new Buffer(n)).writeUInt(i));
    };
    this.length_ +=n;
};

forge.util.ByteBuffer.prototype.getInt16 = function() {
    var a=this.data.readUInt(this.read,2);
    this.read += 2;
    return a;
};

forge.util.ByteBuffer.prototype.getInt24 = function() {
    var a=this.data.readUInt(this.read,3);
    this.read += 3;
    return a;
};

forge.util.ByteBuffer.prototype.getInt32 = function() {
    var a=this.data.readUInt(this.read,4);
    this.read += 4;
    return a;
};

forge.util.ByteBuffer.prototype.getInt32Le = function() {
    var a=this.data.readUIntLE(this.read,4);
    this.read += 4;
    return a;
};

forge.util.ByteBuffer.prototype.getInt = function(n) {
    n=n/8;
    var a=this.data.readUInt(this.read,n);
    this.read +=n;
    return a;
};

forge.util.ByteBuffer.prototype.compact = function() {
    if (this.length()) {
            var a=this.data.slice(this.read,this.length_);
            this.data=new Buffer(Math.max(buffer_size,a.length));
            this.data.set(a);
            this.length_=a.length;
            this.read=0;
    } else {
        this.clear();
    };
};

forge.util.ByteBuffer.prototype.clear = function() {
    this.data=new Buffer(buffer_size);
    this.length_=0;
    this.read=0;
};

forge.util.ByteBuffer.prototype.truncate = function(count) {
    var len=Math.max(0,this.length()-count);
    var a=this.data.slice(this.read,len);
    this.data=new Buffer(buffer_size);
    if (this.data.length>a.length) {
        this.data.set(a);
    } else {
        this.data=a;
    };
    this.length_=a.length;
    this.read=0;
    //console.log(this.data.toString('hex'));
};

forge.util.ByteBuffer.prototype.toHex = function() {
    return this.data.slice(0,this.length_).toString('hex');
};

forge.util.ByteBuffer.prototype.toString = function() {
    return (new Buffer(this.data.slice(0,this.length_).toString('binary'),'utf8')).toString('utf8');
};
jduncanator commented 10 years ago

Not here to argue. The speed improvements of a typed array implementation are not negligible. However if most of forge is indeed working directly on strings, its now time to change. Using typed arrays in browsers that support them offers so many goodies. For example, you can write a shader in WebGL that computes hashes ON THE GPU over many hundreds of cores. A prototype I had working could hash a 1GB file in less than 5 seconds compared to a native JS implementation in forge that took almost 40 seconds. I'm not saying this is a good idea, but it goes to show the possibilities. With WebCL around the corner (hopefully it should pass into a candidate stage within the next week) things will get interesting.

Ayms commented 10 years ago

I am not saying it will not improve, I am saying that if it has to be done it must be thought in details taking into account potential future WebCrypto integration, changing the Buffer interface is not enough.

Now, if you are a candidate to do this, then you are most welcome.

jduncanator commented 10 years ago

@Ayms Well, I'm not saying I am the most qualified, but I certainly know what I'm doing. It would be more a matter of discussion with the Forge community of which direction they want to take and planning it out to make sure its "future proofed".

Ayms commented 10 years ago

Yes, I don't think it's a trivial work, for example it must take into account the promise style of WebCrypto.

But I think it does worth it, because forge does include what WebCrypto don't (or what is part of WebCrypto legendary secondary features): SSL/TLS, certificates management, progressive operations (encryption/hash, WebCrypto is supposed to handle this when Streams are out, but when, we don't know)

If I can help I will.

dlongley commented 10 years ago

@jduncanator,

I'm currently working on getting all the kinks out of the Typed Array backed Forge buffer so it should remove the need for any of that. Would also make life easier when implementing the HTML5 Crypto spec.

Yeah, as I mentioned in #146, we want to have the main forge "Buffer" have methods for accepting/converting to/from ArrayBuffers/TypedArrays. There was some work started on this but it's part of a larger redesign/upgrade, again, as mentioned in #146.

dlongley commented 10 years ago

@Ayms,

OK, but as I wrote you should then rewrite all forge (except for modules that you could replace using WebCrypto API, which for sure will not include SSL/TLS, certificates management, etc), is this your plan?

As @jduncanator mentioned, a full rewrite isn't necessary, but some of the APIs need to be able to accept a common interface (SomeNamedBufferThing) and return the same interface. Most of the internal implementations don't need to change very much, if at all. Experimental work has been started on this as well as work on using ArrayBuffer/DataViews in Buffers. I also provided more details on the plan over at #146. It will be similar (but not exactly) to what you did some time ago when you were trying out a TypedArray implementation.

dlongley commented 10 years ago

@jduncanator,

Not here to argue. The speed improvements of a typed array implementation are not negligible. However if most of forge is indeed working directly on strings, its now time to change. Using typed arrays in browsers that support them offers so many goodies. For example, you can write a shader in WebGL that computes hashes ON THE GPU over many hundreds of cores. A prototype I had working could hash a 1GB file in less than 5 seconds compared to a native JS implementation in forge that took almost 40 seconds. I'm not saying this is a good idea, but it goes to show the possibilities. With WebCL around the corner (hopefully it should pass into a candidate stage within the next week) things will get interesting.

To be clear, this is definitely something we want to do and have wanted to do for a while, it's just a matter of trying to get it right -- with minimal breakage (at least before a 1.0 release of forge, where I'm OK with shedding some really ancient stuff).

I think you get the general idea based on your comments in #146. I'm fairly optimistic that this can be done in a sane way.

dlongley commented 10 years ago

@Ayms,

Yes, I don't think it's a trivial work, for example it must take into account the promise style of WebCrypto.

I don't think this will be too much of an issue. I've written a lot of promise code -- and most of the stuff that we need to convert towards a Promise-based API shouldn't be an issue. Reworking the TLS engine to use Promises would be more difficult -- but I don't expect that to ever be part of WebCrypto. The core pieces of forge could be put into a Promise-based API wrapper without too much difficulty.

But I think it does worth it, because forge does include what WebCrypto don't (or what is part of WebCrypto legendary secondary features): SSL/TLS, certificates management, progressive operations (encryption/hash, WebCrypto is supposed to handle this when Streams are out, but when, we don't know).

Yes, I definitely think the first cut of WebCrypto is quite limited w/o stream support, but I understand their desire to push something out there, even it means having to make some pretty significant changes/bloat the API a bit later to better handle progressive operations. It's disappointing, but that's the way these things go sometimes.

If I can help I will.

Awesome!

Ayms commented 10 years ago

@dlongley

As @jduncanator mentioned, a full rewrite isn't necessary, but some of the APIs need to be able to accept a common interface (SomeNamedBufferThing) and return the same interface. Most of the internal implementations don't need to change very much, if at all. Experimental work has been started on this as well as work on using ArrayBuffer/DataViews in Buffers. I also provided more details on the plan over at #146. It will be similar (but not exactly) to what you did some time ago when you were trying out a TypedArray implementation.

When I did it I was not under this impression, all submodules of forge, like crypto ones, are handling strings, at that time I was trying to eliminate strings but reached the conclusion that it was too much work, that's why my typed array implementation is still using strings while it should not.

But apparently you feel comfortable to handle it, so that's cool.

I don't think this will be too much of an issue. I've written a lot of promise code -- and most of the stuff that we need to convert towards a Promise-based API shouldn't be an issue. Reworking the TLS engine to use Promises would be more difficult -- but I don't expect that to ever be part of WebCrypto.

I will comment on #146 if this can help. I don't think the SSL/TLS module has to follow any promise style, just the common functions with WebCrypro have, so we can choose whether we use WebCrypto or forge.

And indeed we can not really expect for now that WebCrypto handles one day SSL/TLS, as well as certificates management.

skudryk commented 9 years ago

@jduncanator you said "A prototype I had working could hash a 1GB file in less than 5 seconds compared to a native JS implementation in forge that took almost 40 seconds." ..

personally i have very poor perfomance with encrypting (aes-cbc-256) large files, the speed is about 3 seconds per 1 MB, i even tried to read the file by chunks to have ability encrypt files up to 1GB.

My code:

function encryptFile(file) {
      var encrypted_chunks = [];
      var chunk_size = 32048;
      var offset = 0;
      var total = file.size;
      var count = 0;
      var last_chunk = false;
      var iv = forge.util.decode64(crypt_iv_base64);
      var key = forge.util.decode64(crypt_key_base64);
      var cipher = forge.cipher.createCipher('AES-CBC', key);
      cipher.start({iv: iv });

      var p = new Promise(function(resolve, reject) {
        console.log('reading file ' + file.name + ' of size ' + file.size + 'bytes to enrypt');
        var reader = new FileReader();

        // read file by chunks
        var len = (offset + chunk_size < total) ? chunk_size : total ;
        reader.readAsBinaryString(file.slice(0, len));

        reader.onload = function(ev) {
          cipher.update(forge.util.createBuffer(ev.target.result, 'binary')); 
          if (last_chunk) { cipher.finish(); }
          encoded_chunks[count] = cipher.output.getBytes();
          count += 1;
          offset += chunk_size;
          // read next chunk after encrypting current one
          if (offset + chunk_size < total) {
            reader.readAsBinaryString(file.slice(offset, offset + chunk_size));
          } else {
            var len = total - offset;
            if (len < 1) {
              console.log('finished, length of encoded_chunks: ', encoded_chunks.length);
               resolve (str2ab_4(encoded_chunks));
            } else {
                console.log('reading last fragment, count, size: ',count,len);
                last_chunk = true;
                reader.readAsBinaryString(file.slice(offset, offset + len));
            }
          }
        }
      });
      return p;
    }

function str2ab_4(string_array) {
    var length  = string_array.length;
    var buff_len = ((length - 1) * string_array[0].length) + (string_array[length - 1].length);
    var buff = new ArrayBuffer(buff_len);
    var buffView = new Uint8Array(buff);
    for(var i = 0; i < length; i += 1) {
      // slice and map are native functions 
      buffView.set(string_array[i].split('').map(function(el) {return el.charCodeAt();}));
    }
    return buff;
}

After encryption the chunks are concatened in typed array and posted to the server but i can't decrypt data back. What i'm doing wrong?

Thanks

Ayms commented 9 years ago

I don't know if this can help for this or #200 #146 but I am working with Blobs/ArrayBuffers slicing the chunks and passing them to forge crypto which then "binarize" the chunks using:

Uint8Array.prototype.toString=function(enc) {
    var l=this.length;
    var r=[];
    if (enc==='utf8') {
        return (new TextDecoder('utf-8')).decode(this);
    };
    if (enc==='hex') {
        for (var i=0;i<l;i++) {
            var tmp=this[i].toString(16);
            r.push(tmp.length===1?('0'+tmp):tmp);
        };
    };
    if (enc==='binary') {
        //if (!chrome) {
            return String.fromCharCode.apply(null,this);
        /*} else { //bug max call http://code.google.com/p/chromium/issues/detail?id=252492
            //console.log('chrome workaround');
            var cut=16*1024;
            var part='';
            var tmp=this;
            while (tmp.length) {
                var k=Math.min(tmp.length,cut);
                part +=String.fromCharCode.apply(null,tmp.subarray(0,k));
                tmp=tmp.subarray(k);
            };
            return part;
        };*/
    };
    return r.join('');
};

While using the apply method I saw a huge increase in performances, the commented part above is usefull if you run it into a Worker with Chrome, see the related Chrome bug.