dimitrismistriotis / alt-profanity-check

A fast, robust library to check for offensive language in strings, dropdown replacement of "profanity-check".
https://pypi.org/project/alt-profanity-check/
MIT License
68 stars 15 forks source link

Update library to 1dot3dot1 #27

Closed menkotoglou closed 1 year ago

menkotoglou commented 1 year ago

Ran the training script and received the following

Traceback (most recent call last):
  File ".../profanity_check/data/train_model.py", line 40, in <module>
    hash_sha256 = sha256sum(data_file)
  File ".../profanity_check/data/train_model.py", line 22, in sha256sum
    return hashlib.file_digest(f, "sha512").hexdigest()
AttributeError: module 'hashlib' has no attribute 'file_digest'

Found another alternative online for the sha256sum and got me to this

Everything is Ok     

Size:       66714639
Compressed: 17720371
ff90058eaf3ae484e4e9ec7b979bbc4d271dac15098399cca50de27b29d6f5a8ddad4cfdcb021185a737e673126432ae8f3c181050e496daac42c7c904a3ef7e

SHA256 hash of clean_data.csv: 31c287aa47b4b62615f49d6ec47a1d212d4da5ec4b4bb0c6e92593c850288bae
Stored hash to check against: ff90058eaf3ae484e4e9ec7b979bbc4d271dac15098399cca50de27b29d6f5a8ddad4cfdcb021185a737e673126432ae8f3c181050e496daac42c7c904a3ef7e

Traceback (most recent call last):
  File "/home/menelaos/Desktop/personal/alt-profanity-check/profanity_check/data/train_model.py", line 50, in <module>
    assert hash_sha256 == stored_hash, (
AssertionError: Hash of clean_data.csv does not match stored hash. Please download the data again.

There's something wrong on our side and we need to identify what.

dimitrismistriotis commented 1 year ago

Rerun it and got the following :ok: result:

(.venv) 🚀 ~/p/a/p/data on update-library-to-1dot3dot1 ◦ (.venv) python train_model.py 

Training model

Could not find clean_data.csv, will try to extract.

Decompress clean_data.csv

Listing compressed file contents:

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs AMD Ryzen 7 1700 Eight-Core Processor           (800F11),ASM,AES-NI)

Scanning the drive for archives:
1 file, 17720371 bytes (17 MiB)

Listing archive: clean_data.csv.7z

--
Path = clean_data.csv.7z
Type = 7z
Physical Size = 17720371
Headers Size = 138
Method = LZMA2:24
Solid = -
Blocks = 1

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2023-06-23 18:46:54 ....A     66714639     17720233  clean_data.csv
------------------- ----- ------------ ------------  ------------------------
2023-06-23 18:46:54           66714639     17720233  1 files

Extract clean_data.csv from clean_data.csv.7z:

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs AMD Ryzen 7 1700 Eight-Core Processor           (800F11),ASM,AES-NI)

Scanning the drive for archives:
1 file, 17720371 bytes (17 MiB)

Extracting archive: clean_data.csv.7z
--
Path = clean_data.csv.7z
Type = 7z
Physical Size = 17720371
Headers Size = 138
Method = LZMA2:24
Solid = -
Blocks = 1

Everything is Ok     

Size:       66714639
Compressed: 17720371
ff90058eaf3ae484e4e9ec7b979bbc4d271dac15098399cca50de27b29d6f5a8ddad4cfdcb021185a737e673126432ae8f3c181050e496daac42c7c904a3ef7e

SHA256 hash of clean_data.csv: ff90058eaf3ae484e4e9ec7b979bbc4d271dac15098399cca50de27b29d6f5a8ddad4cfdcb021185a737e673126432ae8f3c181050e496daac42c7c904a3ef7e
Stored hash to check against: ff90058eaf3ae484e4e9ec7b979bbc4d271dac15098399cca50de27b29d6f5a8ddad4cfdcb021185a737e673126432ae8f3c181050e496daac42c7c904a3ef7e

No idea why it failed in your setup

dimitrismistriotis commented 1 year ago

@menkotoglou For 3.8: ERROR: No matching distribution found for numpy==1.26.0. Is it possible that we might downgrade for Python 3.8 or is it no more supported from scikit?

dimitrismistriotis commented 1 year ago

From https://pypi.org/project/scikit-learn/

scikit-learn requires: Python (>= 3.8)

Maybe manage version of numpy is a good solution.