Open sh-dv opened 2 years ago
XOR only needs a simple implementation that also happens to run quite fast (not that performance is relevant to Covert here):
function xor(a, b) {
const a32 = new Uint32Array(a), b32 = new Uint32Array(b) // These are ArrayBuffer views, no copying
for (let i = 0; i < b32.length; ++i) a32[i] ^= b32[i]
}
Decryption of Minisign keys needs scrypt and blake2b and SSH needs bcrypt and aes, none of which are used anywhere else in Covert. AES is part of WebCrypto/Subtle and thus available in browsers, others need JS/Wasm implementation but the only one that needs to be fast is scrypt. Other operations take almost no time even with very inefficient implementation. SSH uses extremely weak settings with bcrypt. Also, SSH keys without password need none of the crypto, but Minisign is always encrypted even with an empty passphrase, taking nearly one second to calculate on native implementation.
For an initial implementation, support for these encrypted keys can easily be omitted.
Armor encode/decode can use standard Base64 functions already available in browsers. Python Covert adds some additional processing that is not crucial to implement, and that can be ported quite easily to Javascript if so desired.
In another issue there was mentioned that a previous similar project was done in JS. is there any github link to that or a gist ? Also in the python code there are some cffi lines of code, is that something that should be taking into account in js ? Also the bytes() function kinda tricky in js. These are somethings i noticed when i took a look at the covert python code.
@sh-dv chacha.py reimplements some Python code of pynacl, which itself does not properly expose the C functions it wraps. We do not compile any C code in Covert, just using the existing raw C bindings provided by pynacl (and this is really only for maximal speed, avoiding some buffer copying). Javascript ports of libsodium probably don't need any such changes.
You can simply new ArrayBuffer(n)
to accomplish the same as bytes(n)
. Javascript ArrayBuffer has the benefit of being read-write, while Python bytes are read-only (thus calling for use of read-write bytearray
and some conversions in various places where bytes
are used, and additionally some conversions between int/bytes where numbers are needed in cryptography and for blockstream nextlen fields). For memoryview
there is a similar view JS construct new DataView(...)
that avoids copying of data on slicing (where buf.slice(...)
always makes a copy and that can be expensive if the slice is large).
I could have a look at writing most of the worker code in JS myself, if you are interested on handling the other parts. And if you do plan a bigger rewrite of hat.sh at some point, you are perfectly welcome to base it on the Covert format.
That sounds perfect. It will be much easier if i took a deep look at pynacl and compare it to the C or JS ones, then i will have better overall understanding.
Do you know if it is possible to run multiple workers that run in separate threads using more than one CPU core? I know that a single worker is always single-threaded but it would be awesome if we could do some things in parallel for extra speed.
Depends on what type of workers. Is it a web worker or a service worker?
In web workers you can run multiple workers without any issues. Multi threading can be achieved. and full utilization of cpu cores. However, web workers can't handle fetch events and responses, thus you are limited to low file sizes (RAM usage). Or just wait for the FileSystemWritableFileStream feature to be released and fully supported in all major browsers. which in my opinion behave the same way hat.sh sw does.
In service workers, i think it is the same case regarding multi threading and cpu usage. because both workers are features provided by browsers and can be used through JS. SW life cycle have to be taken into account. In my experience while developing v2 beta of hat.sh, i tried to do some concurrency to make things go faster and that went horribly. Not only that, trying the same code (encryption) in 2 windows (tabs) in the same browser went bad and file mix up happened. Because it is a service worker (fetch event).
What you are trying to accomplish here could be done in a web worker without any issues (afaik). However there is the file size limit because it will be read as a whole in memory, unless you can find another way to stream download, which I spent a lot of time figuring that out and failed and chose service workers.
That is good to hear. I believe that most work should indeed be done in web workers. Data can be passed around in SharedArrayBuffer
so the file size should not be an obstacle, and for password hashing only small messages need to be sent. However, I realise that such implementation is much more difficult and thus would first go for single threaded (threads are the reason for most of the complexity you see in blockstream.py).
Just considering the future options here, and possibly FileSystemWritableFileStream will also be supported soon enough :)
Updated TODO on first post. Not everything needs to be implemented at first, but I believe that feature-wise it is quite complete now. This is supposed to be implemented roughly from top to bottom, both on the main components and on the subtasks on each point.
I would start by getting decryption working for password-encrypted files that only contain one block (i.e. are 1024 bytes or smaller). Once that is working, it is good to build up from there by making blockstream work, and once that runs OK for both encryption and decryption, look at implementing the archive. And only finally going for public keys, given how many points there are to be solved for them. And signatures should be left pretty much last, as not even the Python side specification on those is finished.
Password normalise/UTF-8 solved:
// String to ArrayBuffer conversion with Unicode normalisation
const encode = str => new TextEncoder().encode(str.normalize('NFKC'))
I think the best that can be done for github keys is to open the link in new tab and ask the user to copy&paste. They block everything else via CORS and frame policy.
Cool. Do you plan on opening a new repo named covert-web or a new folder in the main repo?
We can fork one from your project and give you permissions to it, or you can make a branch on yours, whichever you prefer. A separate repo of Python version in any case, as no code is shared between the two.
I suggest a new repo "covert-web" on your acc. First, a JS demo should be established once necessary tasks above are completed. Then that demo(prototype) will be converted in a modern react ui.
Would you mind setting up a NodeJS project here now?
Sure
Thanks. Leave me some folder/file where to drop the JS worker code snippets as I progress through that TODO. Because I am really not that familiar with NodeJS nor React (despite having used both). It also helps if we can have a working build with as many of the third party modules included as possible.
ES6 with no semicolons, with imports & otherwise modern JS is preferred.
Added some code but will need WebPack, more modules and possibly something to run tests with etc. I gather there already is password generation in hat.sh, so perhaps that code can be copied here directly (until then I unchecked that box). And then nearly all of password auth is already done (although will certainly need more work).
Done zxcvbn pwd hints. (needs second look).
i created a tests folder and did some manual tests. we can use jest.js. Or if it is enough like the one i just push, i don't have a problem with that ATM.
EDIT:
oops, looks like i did the hints function u already did. Please delete them if they are not needed.
consider splitting code into multiple files ?
I believe your code is better (I see you ported it directly from Python), so we keep that. Might need a timer or a worker for updates though. If updated on every change event, it slows down the text input.
I won't work more on this today, so feel free to restructure my code or whatever, there won't be conflicts.
One more snippet
const randpad = prefsize => Math.round(-prefsize * Math.log(1 - Math.random()))
I'm thinking about getting rid of the passphrase autocomplete. It doesn't feel right because this won't be a cli environment. One click for passphrase generation is enough. It feels weird if someone wants to input a phrase/word that they have in mind and then you get hit with these suggestions. What do you think? And some users prefer the use of passwords over passphrases, we can also add password generation where there is a switch that let's you choose generate button preference to return password/passphrase. Also must be a high entropy.
It is useful in GUI and Web too, but perhaps not by tab and rather as popup (although then it can reveal parts of the password). Needs further thought but the autocomplete makes it much easier to remember and input the passphrase.
I understand. please consider the use of sodium.randombytes_random()
instead of Math.random()
because it provides a larger security margin.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
Honestly... Math.random is perfectly fine for randomising file size (but not so good for creating cryptographic keys). I would like to use secure cryptographic random here only because it makes some people happier but given the added complexity this is by no means a priority.
As for traditional complex passwords, just make sure that the browsers' built-in function for "suggest strong password" works as expected, we don't need to implement anything for it. This does of course store the password in the browser's keystore, which can be an advantage or a disadvantage depending on the type of security you need.
If anything, we should have a slider for controlling how many words the generated passphrase has. Three words might be enough for some casual use, four are usually optimal (convenience allowed by strong hashing) and five are already highly secure against any imaginable attack. Five words are still easier to memorise and type than 10 random letters and numbers, which offer the same entropy.
One possible development is to have separate wordlists for different languages but we'll need a better list generation tool and some people speaking each language before that can be implemented, The current list was made from existing bad lists with simple tools and quite a bit of manual intervention, and is itself also still under development.
Your note on autocompletion made me think about predictive keyboards (on smartphones) and how they can also fix typos. Automatic typo correction for the dictionary words would also be a cool development, as long as implemented such that it still allows entering a password that resembles words in the dictionary but is intentionally different (for which one possible implementation is to try hashing both options in parallel to see which one fits - this does not actually weaken security).
A little bug: password generation returns undefined
if zxcvbn doesn't validate the password. Needs a loop.
Also a minor style thing. I would suggest using const
instead of let
whenever possible (on all variables that are not later assigned to, even if their content is modified). Even for-each loop variables can be const
because on each iteration it is a new variable.
"suggest strong password" works as expected,
Sure.
we should have a slider for controlling how many words the generated passphrase has.
of course that's good ux .
For your note about the words auto completion. I suggest rewriting the function to have a spell check included before returning the auto completed predicted word. Then you would have achieved both at the same time. But i don't think that this leaves place for a complex word auto completion function where the input is "peaangle" for example. At least i can't achieve that, maybe you can.
People lock their valuables in gym lockers, or their bikes, with 3-digit combination locks. A single word password from Covert wordlist has the same amount of security, but trying combinations is slower on Covert than on the gym lock (well, if Covert even allowed passwords that short). The UI should probably restrict that to more reasonable limits (like 3-6 words, which also ensures that the passwords are long enough to be accepted) but I loosened that restriction on the low level function.
Don't think too much about autocorrection or perhaps even completion now. If you port the Python function directly, it can skip previous correctly written words and complete the one under cursor, while ignoring anything right of cursor (and even cursor positioning doesn't really need to be implemented). Certainly makes sense to perform autocorrection when the tab key is pressed, too.
How can I install this and run tests? I hear yarn is better than npm, does it matter any?
EDIT: apparently this, once implemented in package.json:
yarn install
yarn test
Now that passwords are done, I am going to work on the header/auth part probably tomorrow.
This is also good practice for the upcoming C library, because the helper functions that we are doing right now are going to be essentially all that the C version will have, and reimplementing them here allows rethinking the API a bit, since the Python version was originally written as a proof of concept application, not as a modular library, and due to that is still a bit more convoluted than I'd like.
Should probably also move the js files to some subdirectory/ies instead of repository root.
Great work so far, you have been very helpful! :)
We can go with yarn.
Tests for now are just simple to check if everything is going smoothly. Will use proper testing with Jest/Cypress in react.
you can cd tests
and run node. for example node passphrase.test.js
i guess this will show error on the generate function because you used crypto.getRandomValues()
in the recent update which is made for browsers.
In browser the WebCrypto should be preferred as likely faster and safer than the compiled libsodium, for those few algorithms that it supports. Although WebCrypto only works in secure context (https) and thus might not be usable if we did a single file build that can run without web server from local disk (not sure if that is quite possible with workers and all, in any case).
Is there any benefit of making it also compatible with Node? Looks like some crypto stuff will need two parallel implementations then (at least if performance matters). I was thinking that Node like all other languages would be covered eventually by bindings the C library of Covert.
Probably should have our own wrappers for common crypto stuff in any case, allowing platforms to use different implementations and to simplify the APIs in other code that may need to call those functions often. I prefer our own encrypt(data, nonce, key, aad)
instead of sodium.crypto_aead_chacha20_blahblah_ietf_encrypt(too, many, parameters)
(and WebCrypto is far worse than that).
Speaking of testing, perhaps instead of jest there is some way to run automatic tests in a real browser (like selenium)? If that is not too difficult to setup and still provides useful debug info on problems, it would be one option to look at.
Probably should have our own wrappers for common crypto stuff in any case,
Agree
(like selenium)?
Selenium run tests outside the browser. I will write tests with Cypress, where it can also mimic user behavior. Once the UI will be completed in react in the future.
The details of randomisation are quite subtle. Might need to consider backporting the JS implementation to Python in case it offers better numbers in extreme cases. EDIT: did so, now Python version is better too (at least theoretically!)
Added Monocypher but not yet using it for anything. It also has chacha20-poly1305 and other functions currently used from sodium, that need to be benchmarked as they could be a little faster (but no SIMD, so I am not expecting miracles). But most notably it has some functions that no other crypto libraries provide, for Elligator hashing and dirty key generation.
@sh-dv Would you like to get back to hacking this? I'll be busy preparing the 0.6.0 release of the main app in the coming days, but can then devote some time to this again.
Sure, I will try to find some time to do some tasks from the issue.
You've been on fire! I am trying to get pubkey parts going today but just found one unfortunate setback. The Bech32 implementation you found doesn't work with Age keys. I got the same problem with some Python modules but eventually found one that worked correctly (the other code seemed too difficult to fix, bit twiddling overflows or something).
Options:
My code quality could be improved and it needs more tests and proper error handling, but got many checkboxes done. Bech32 I won't look at but there are tests with Age keys in case you can find another implementation that works. The next steps for me: finish the few remaining pieces for a fully functioning pubkey auth, then blockstream stuff.
Passphrase and pubkey auth both working and tested. Multiple recipients not properly tested but in principle should be working.
@sh-dv The blockstream code should probably become next but I have no idea of how the streaming would work. Perhaps you can help with that part? For files less than gigabytes of size of course we can load the entire ciphertext in one Uint8Array and then extract BLOBs out of it for all the files attached but ideally this should all use streaming. It won't be one contiguous stream as individual blocks of 1-16 MB will need to be decrypted in sequence, and some headers removed to obtain the plaintext data.
A related UI concern (not right now but let's say later this Spring) is to have attached image/video/audio files displayed inline with the message, instead of having to "download" them first. Videos probably need some player magic as the HTML5 video element itself seems quite restricted in format support. I'm sure there are existing implementations (NodeJS packages) that can be used.
In any case we should get this running in browser quite soon, as the limits of what can be just written and tested conveniently on Node alone are already closing in. Not entirely sure whether React is the way to go but if you think it is the most viable option then we can proceed with that. I never quite liked the state management or that the entire DOM has to be managed by it and doing that requires quite a bit of trickery, although I do like the JSX syntax.
This is the to-do list for the first JS prototype of Covert. This prototype should demonstrate fast and efficient encryption of files/messages using chunked encryption (with chacha20poly1305 and ARGON2 browser SIMD) and service worker stream based file download (FetchEvent with ReadableStream based Response to download). And public key encrypt & sign with SSH, Age and Minisign keys.
This should be interesting. If something is missing feel free to edit. Or if it looks messy/unorganized lol.