Sometimes, the scan fails due to a tokeniser error raised by the PasswordModel
For example (scanning repo https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise)
Exception in thread credentialdigger@https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise:
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 793, in scan
return self._scan(
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1142, in _scan
self._analyze_discoveries(mm, password_discoveries, debug)
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1225, in _analyze_discoveries
model_manager.launch_model_batch(discoveries)
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/model_manager.py", line 66, in launch_model_batch
return self.model.analyze_batch(discoveries)
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 50, in analyze_batch
data = self._pre_process([d['snippet'] for d in new_discoveries])
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 105, in _pre_process
encodings = self.tokenizer(snippet,
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2404, in __call__
return self.batch_encode_plus(
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2589, in batch_encode_plus
return self._batch_encode_plus(
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 720, in _batch_encode_plus
batch_outputs = self._batch_prepare_for_model(
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 792, in _batch_prepare_for_model
batch_outputs = self.pad(
File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2714, in pad
raise ValueError(
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []
Sometimes, the scan fails due to a tokeniser error raised by the PasswordModel
For example (scanning repo
https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise
)