ebellocchia / bip_utils

Generation of mnemonics, seeds, private/public keys and addresses for different types of cryptocurrencies
MIT License
323 stars 86 forks source link

Undesired access to file system by bip_utils #141

Open Anynomouss opened 1 month ago

Anynomouss commented 1 month ago

I have a very strange issue with bip_utils, which I think I do not remember having in the past. When running bip_utils on Windows Subsystems for Linux (Ubuntu20.04, also Ubuntu 22.04, both systems wide and clean virtual Python environments), bip-utils "Unlock drive" pop-ups from bitclocker, meaning that bip_utils or some component thereof is trying to access the filesystem. This is strange behavior and because there is a slight chance it is an exploit I am reporting it.

It is also mighty irritating since I am running Python scripts that utilize bip-utils in parellel triggering huge amounts of pop ups and notifications from Bitlocker, so many that I run out of RAM because of the number of pop-ups. Below is some example code, I narrowed it down to lines of code that trigger this strange behavior.


import binascii # for conversion between Hexa and bytes
from bip_utils import (P2PKHAddrEncoder, Bip32Slip10Secp256k1, Bip44, Bip49, Bip84, Bip86, Bip44Coins,Bip49Coins, Bip84Coins, Bip86Coins, Bip44Changes, Bip38Decrypter, Bip38Encrypter, CoinsConf,
                       ElectrumV1WordsNum, ElectrumV1MnemonicGenerator, ElectrumV1SeedGenerator, ElectrumV1, ## Electrum V1 dependencies only
                       ElectrumV2WordsNum, ElectrumV2MnemonicTypes, ElectrumV2MnemonicGenerator, ElectrumV2SeedGenerator, ElectrumV2Standard, ## Electrum V2 dependencies only
                       IPrivateKey, WifPubKeyModes, WifEncoder,WifDecoder,Bip32KeyData,Bip32KeyDeserializer)

from pybip39 import Mnemonic, Seed 
import csv
import os
import sys

mnemonics = sys.stdin.readlines() 

csvwriter = csv.writer(sys.stdout, delimiter=' ',lineterminator='\n') #os.linesep
mnemonic = Mnemonic() # This is slow =, so do only ones
for words in mnemonics:
    words = words.strip()
    try:
        #seed_bytes = mnemo.to_seed(words)
        mnemonic.validate(words)
        seed = Seed(mnemonic.from_phrase(words), "")
        seed_bytes = bytes(seed)
    except:
        continue
    #csvwriter.writerow([words])

    ## Any of the lines below trigger these pop ups meaning there is an attempt to access the file system
    bip32_ctx_m = Bip32Slip10Secp256k1.FromSeedAndPath(seed_bytes, 'm') # Derive at master level
    bip49_mst_ctx = Bip49.FromSeed(seed_bytes, Bip49Coins.BITCOIN)
    bip86_mst_ctx = Bip86.FromSeed(seed_bytes, Bip86Coins.BITCOIN)

The input of the test script is many lines with on each line a single mnemonic with words separated by pace. Save the script above as test_parallel_error.py and run it on any Windows Subsystems for Linux command shell to reproduce this behavior:

_printf "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about" {1..800000} | parallel --pipe -j 8--blocksize 10000 --spreadstdin python test_parallelerror.py

It should trigger these popups as long as you have at least one drive connected that is locked and encrypted with Bitlocker. Bitlocker popups are however only the symptom, the real question is why any part of bip utils is trying to get access to the file system in the first place.

ebellocchia commented 1 month ago

Hi, the library accesses the file system only to read the mnemonic words lists (BIP39, Electrum, etc...). A mnemonic file is only loaded once, i.e. the first time it is needed (e.g. when creating a mnemonic class), and then kept into memory. The mnemonic files are deployed together with the library. You can find the paths in the source code, for example:

class Bip39MnemonicConst:
   ...

    # Language files
    LANGUAGE_FILES: Dict[MnemonicLanguages, str] = {
        Bip39Languages.ENGLISH: "wordlist/english.txt",
        Bip39Languages.ITALIAN: "wordlist/italian.txt",
        Bip39Languages.FRENCH: "wordlist/french.txt",
        Bip39Languages.SPANISH: "wordlist/spanish.txt",
        Bip39Languages.PORTUGUESE: "wordlist/portuguese.txt",
        Bip39Languages.CZECH: "wordlist/czech.txt",
        Bip39Languages.CHINESE_SIMPLIFIED: "wordlist/chinese_simplified.txt",
        Bip39Languages.CHINESE_TRADITIONAL: "wordlist/chinese_traditional.txt",
        Bip39Languages.KOREAN: "wordlist/korean.txt",
    }

This is the only function accessing files as you can verify from the source code:

class MnemonicWordsListFileReader:
    @staticmethod
    def LoadFile(file_path: str,
                 words_num: int) -> MnemonicWordsList:
        # Read file
        with open(file_path, "r", encoding="utf-8") as fin:
            words_list = [word.strip()
                          for word in fin.readlines()
                          if word.strip() != "" and not word.startswith("#")]

        # Check words list count
        if len(words_list) != words_num:
            raise ValueError(f"Number of loaded words list ({len(words_list)}) is not valid")

        return MnemonicWordsList(words_list)

But it's always been like this since the very very beginning, this is nothing new.

ebellocchia commented 1 month ago

I added a print in the function that loads the mnemonic files and tried your code snippet, there is no file access as I expected (since you are not using any class for generating mnemonics or seeds). It could also be that some dependencies are accessing the file system for some reasons, maybe you can try downgrading versions to check if it's the problem.

Anynomouss commented 1 month ago

I will do some more testing when I have time and will let you know if I can trace the root cause.

ebellocchia commented 6 days ago

Hi, did you find something?

Anynomouss commented 6 days ago

Sorry, I got distracted. I had some plans to test it with some new clean virtual python environments to trace down the cause. Also noteworthy to mention I use the code in combination with the bash Gnu parallel package. https://www.gnu.org/software/parallel/ I will see if can do some test this week to see if the issue can be closed or requires further investigation.