beta6 / PassGAN

Generative Adversarial Network Password Generator . Updated & improved & working version
MIT License
18 stars 5 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2076: invalid continuation byte #3

Closed traines3812 closed 1 year ago

traines3812 commented 1 year ago

Hello I am receiving this issue in both the none env and both env. I have tried changing different python versions as well any other advice on what I should try?

File "/home//passganenv/PassGAN/utils.py", line 102, in load_dataset lines=list(map(lambda x: x.strip("\n"), open(path, 'r').readlines())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2076: invalid continuation byte

beta6 commented 1 year ago

Follow the instructions from previous posts. I wrote 2 articles with instructions in https://www.tuxrincon.com

traines3812 commented 1 year ago

I did the following

i used python 3.9.2 . apt install python3-virtualenv virtualenv passganenv cd passganenv source bin/activate git clone https://github.com/beta6/PassGAN pip install --upgrade pip pip install -r PassGAN/requirements.txt cd PassGAN

beta6 commented 1 year ago

I understand that standard instructions didnt work. if that dindnt work try cleaning a bit the dictionary with sed or some tool leaving charset similar to ascii standard. It fails because of utf-8 chars possibly. The cleaning command should be something like sed -i '/[^\x00-\x7F]/d' your_file.txt Make a copy of the diccionary before that. The program worked for me with utf-8 diccionaries whatsoever but the previous instructions ensure that step. Let me know if that worked or not to help a little or fix that.

beta6 commented 1 year ago

Can you paste the full commandline that gives the error and the full trace please?

traines3812 commented 1 year ago

python train.py --output-dir output --training-data data/train.txt

Traceback (most recent call last): File "/home/#####/PassGAN/train.py", line 116, in lines, charmap, inv_charmap = utils.load_dataset( ^^^^^^^^^^^^^^^^^^^ File "/home/#####/PassGAN/utils.py", line 102, in load_dataset lines=list(map(lambda x: x.strip("\n"), open(path, 'r').readlines())) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

beta6 commented 1 year ago

check the existence of both output directory and data/train.txt file. remember that its not needed to train rockyou dictionary nor crackstation nor darkc0de+openwall+xato-net-10M as they are already trained and included with git version and they can be used freely. If you use another dictionary try cleaning it as i told you before with sed command from commandline

traines3812 commented 1 year ago

ok I'll try with a custom and let you know here shortly

Thank you

beta6 commented 1 year ago

i will upload to repo an utility (data/clean.py) that i used and get rid of this issue before training. Sorry for the inconvenience.

beta6 commented 1 year ago

its uploaded. Using clean.py, the program train.py should run withouth errors. Let me know if you find more issues. Thanks for your feedback

beta6 commented 1 year ago

tested and working with : NVIDIA Driver Version: 535.54.03 CUDA Driver Version: 12.2 Debian 11 / Ubuntu

traines3812 commented 1 year ago

training is working thank you for your help and thanks for that clean.py sorry if I added any extra work.