MetaMask / metamask-extension

:globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
https://metamask.io
Other
11.64k stars 4.77k forks source link

Migrate to one bip39 package. We use multiple right now, ethers and bip39, both of which have their own wordlists #14929

Open hilvmason opened 2 years ago

rekmarks commented 2 years ago

The Snaps team recommends using @scure/bip39, as that's what we use in @metamask/key-tree.

danjm commented 1 year ago
danjm commented 1 year ago

@rekmarks MetaMask currently uses https://github.com/MetaMask/bip39, a fork of https://github.com/bitcoinjs/bip39. I believe this fork was needed to support one of the security fixes to https://medium.com/metamask/security-notice-extension-disk-encryption-issue-d437d4250863

Should we all use that one?

cc @Gudahtt

danjm commented 1 year ago

2 options:

  1. research alternatives to bitcoinjs/bip39 that can meet the needs of our fork but are better maintained. Whatever alternative we use, we might need to get them to add a new method that accepts the mnemonic as raw bytes (e.g. an array buffer or ethers.js' "BytesLike") and returns raw bytes. This would meet the needs met by our @metamask/bip39 package
  2. abandon the effort to keep the seed phrase as a string out of memory
paulmillr commented 1 year ago

@scure/bip39 used in snaps is smaller, supports tree shaking, and works on byte arrays.

It has also been audited and is used in EF's repo ethereum-cryptography.

Gudahtt commented 1 year ago

It looks like that library still accepts the mnemonic as a string, and it still returns the mnemonic as a string (e.g. here and here). Using this library would mean abandoning our efforts to keep the "mnemonic as a string" out of memory. For context, we're avoiding using the mnemonic as a string in memory as a way to prevent low-effort local attack vectors that involve dumping browser memory and searching for a mnemonic-like string.

paulmillr commented 1 year ago

low-effort local attack vectors that involve dumping browser memory

low-effort

dumping browser memory

Seems like a joke to me.

paulmillr commented 1 year ago

As a side note: making @scure/bip39 not use mnemonics takes 2 lines of code. But i'm sure you're very wrong here if you are doing this and I can describe why.

rekmarks commented 1 year ago

@paulmillr all of the concerns Mark are raising are due to actual vulnerabilities that were disclosed to us and patched. Mark is not attacking your work or the package, everything he's saying is purely out of concern for the safety of our users. The benefits of your package are clear, and I'm sure that we can make any modifications necessary. Let's keep a respectful tone here.

paulmillr commented 1 year ago

Okay, that makes sense. The reason I had this tone is that "dumping browser memory" meant extremely complex attacks which cannot be easily tested against, also garbage collector provides no guarantees as for when any secrets are removed from the memory — so protecting against something like this is premature optimization.

This one however is more like "browser session restore attacks" and seems pretty common. Sorry for that!

adonesky1 commented 1 year ago

making @scure/bip39 not use mnemonics takes 2 lines of code

@paulmillr could you elaborate a bit what you mean by this?

I'm attempting to patch a fork of @scure/bip39 to meet our requirements that the mnemonic never be in memory as a plain string. Doesn't seem like we can use the Coder interface for mnemonicToEntropy() or entropyToMnemonic(), since it returns the array of strings for encode() and only expects an array of strings for decode()...

paulmillr commented 1 year ago

@adonesky1 what is your flow? What happens when a user enters phrase into the input? In what format you store it?

adonesky1 commented 1 year ago

So currently our flow is, from the user inputing the seedphrase:

paulmillr commented 1 year ago

It should work like this:

// words is an array of words here
export function mnemonicToEntropy(mnemonic: string, wordlist: string[]): Uint8Array {
  const { words } = normalize(mnemonic);
  const entropy = getCoder(wordlist).decode(words);
  assertEntropy(entropy);
  return entropy;
}

// you can remove alphabet and expect array of numbers/indexes which represent wordlist[index]
function getCoder(wordlist: string[]) {
  if (!Array.isArray(wordlist) || wordlist.length !== 2 ** 11 || typeof wordlist[0] !== 'string')
    throw new Error('Worlist: expected array of 2048 strings');
  wordlist.forEach((i) => {
    if (typeof i !== 'string') throw new Error(`Wordlist: non-string element: ${i}`);
  });
  return baseUtils.chain(
    baseUtils.checksum(1, calcChecksum),
    baseUtils.radix2(11, true),
    baseUtils.alphabet(wordlist)
  );
}

// you can receive array of words here
function normalize(str: string) {
  const norm = nfkd(str);
  const words = norm.split(' ');
  if (![12, 15, 18, 21, 24].includes(words.length)) throw new Error('Invalid mnemonic');
  return { nfkd: norm, words };
}
adonesky1 commented 1 year ago

@paulmillr I think we want to be able to pass the mnemonic to Coder.decode() in mnemonicToEntropy in buffer form too. And then keep the mnemonic in buffer form during the implementation of entropyToMnemonic

paulmillr commented 1 year ago

Everything is doable. Just requires more work. The code i've mentioned is the simplest one that can be done with just a few changes — where you will pass wordlist indexes, which is just 12/24 uint16 words. See:

const phrase = 'length divorce zoo anger act cave alter art device emerge caution vicious'
const indexes = phrase.split(' ').map(word => wordlistEn.indexOf(word));
// [ 1024, 512, 2047, 69, 19, 294, 59, 102, 485, 581, 293, 1948 ];
// Can be trivially converted to uint8array and back
const u8a = new Uint8Array(new Uint16Array(indexes).buffer);
// Uint8Array(24) [
//     0, 4,  0, 2, 255, 7,  69, 0,
//    19, 0, 38, 1,  59, 0, 102, 0,
//   229, 1, 69, 2,  37, 1, 156, 7
// ]

// Other stuff
// HEX for json
const hex = bytesToHex(u8a);
// '00040002ff074500130026013b006600e501450225019c07'
const base_64 = base64.encode(hex);
// 'AAQAAv8HRQATACYBOwBmAOUBRQIlAZwH'
// AES encryption, will add 28 bytes to the output.
const encrypted = await aes.encrypt(some_static_metamask_storage_key, u8a);

// Convert back
const recoveredIndexes = Array.from(new Uint16Array(u8a.buffer));
const recoveredPhrase = recoveredIndexes.map(i => wordlistEn[i]).join(' ');

It would take much less space to store it than the current approach. And it can also be easily changed into mnemonic phrase.

Mnemonics could be stored as json/integer array, a hex string, or base64 string. You can also encrypt it.

adonesky1 commented 1 year ago

@paulmillr would you mind taking a quick look at this first-pass PR I've made to our fork of @scure/bip39 to allow us to pass Buffers and Uint8Arrays to the BIP39 functions we need. I've tested it and it seems to be working for our needs, but it would be awesome if you could gut check it for me.

cc @kumavis @rekmarks

adonesky1 commented 1 year ago

Final pieces here: