Closed HardChalice closed 10 months ago
Moving past that above,
Running into a new issue of jobs failing being submitted. The original error is the following:
crackq | ERROR app.py:1891 log_exception 2022-03-17 17:17:43,195 Exception on /api/queuing/all [GET]
crackq | Traceback (most recent call last):
...
crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 517, in restore
crackq | self.meta = self.serializer.loads(obj.get('meta')) if obj.get('meta') else {}
crackq | _pickle.UnpicklingError: invalid load key, '{'.
Updated requirements.txt
configparser==5.0.1
Flask==1.1.4
redis==3.5.3
rq==1.10.1
marshmallow==3.9.1
pytest==6.1.2
pytest-cov==2.10.1
flake8==3.8.4
python-ldap==3.3.1
Flask-Sessionstore==0.4.5
SQLAlchemy==1.3.24
Flask-SQLAlchemy==2.5.1
SQLAlchemy-Utils==0.38.2
flask-talisman==0.7.0
pysaml2==6.5.1
flask-Login==0.5.0
Flask-Cors==3.0.9
Flask-SeaSurf==0.2.2
Flask-Migrate==3.0.1
bcrypt==3.2.0
Flask-Bcrypt==0.7.1
pathvalidate==2.3.1
markupsafe==2.0.1
Updating rq
to version 1.10.1 didn't throw the same error as above however I am receiving a new error:
crackq | DEBUG crackqueue.py:170 error_parser 2022-03-21 15:43:43,875 Parsing error message: Traceback (most recent call last):
crackq | File "/usr/local/lib/python3.8/dist-packages/rq/worker.py", line 1061, in perform_job
crackq | rv = job.perform()
crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 821, in perform
crackq | self._result = self._execute()
crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 844, in _execute
crackq | result = self.func(*self.args, **self.kwargs)
crackq | File "/opt/crackq/build/crackq/run_hashcat.py", line 688, in hc_worker
crackq | hcat = runner(hash_file=hash_file, mask=mask,
crackq | File "/opt/crackq/build/crackq/run_hashcat.py", line 192, in runner
crackq | raise ValueError('Aborted, speed check failed: {}'.format(err_msg))
crackq | ValueError: Aborted, speed check failed: Work-horse was terminated unexpectedly (waitpid returned 139)
Thanks for the update. Have a look in /utils there's a couple of scripts that will help you get more info on the error message for the speed_check queue as it's a hidden job queue
python3 rq_queryqueue.py speed_check
^ this will get you the list of jobs, copy the job id in question there
python3 rq_queryjob.py speed_check <job_id>
^ this will get a more detailed error message for that job
You may need to modify the scripts as it looks like the name resolution has changed in docker networking recently:
redis_con = Redis('redis', 6379)
to
redis_con = Redis('127.0.0.1', 6379)
Running rq_queryjob.py
outputs the following for a failed speed check:
Description: crackq.run_hashcat.show_speed(attack_mode=3, brain=True, hash_file='/var/crackq/logs/1c5ce07dd02e41b89cf52e2b025f4593.hashes', hash_mode=1000, mask='?a?a?a?a?a?a', name='Test', pot_path='/var/crackq/logs/crackq.pot', session='1c5ce07dd02e41b89cf52e2b025f4593', speed_session='1c5ce07dd02e41b89cf52e2b025f4593_speed', username=True, wordlist2=None, wordlist=None)
Result: None
Status: failed
Execution info: Work-horse was terminated unexpectedly (waitpid returned 139)
Meta {}
OK. If you tick disable brain does it run the job or give more detail in the error?
Disabling brain runs the job from what I've noticed. Still dont have a decent pool of test hash files to use since it finished that test_customer_domain.hashes
.
Moving past that above,
Running into a new issue of jobs failing being submitted. The original error is the following:
crackq | ERROR app.py:1891 log_exception 2022-03-17 17:17:43,195 Exception on /api/queuing/all [GET] crackq | Traceback (most recent call last): ... crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 517, in restore crackq | self.meta = self.serializer.loads(obj.get('meta')) if obj.get('meta') else {} crackq | _pickle.UnpicklingError: invalid load key, '{'.
Updated
requirements.txt
configparser==5.0.1 Flask==1.1.4 redis==3.5.3 rq==1.10.1 marshmallow==3.9.1 pytest==6.1.2 pytest-cov==2.10.1 flake8==3.8.4 python-ldap==3.3.1 Flask-Sessionstore==0.4.5 SQLAlchemy==1.3.24 Flask-SQLAlchemy==2.5.1 SQLAlchemy-Utils==0.38.2 flask-talisman==0.7.0 pysaml2==6.5.1 flask-Login==0.5.0 Flask-Cors==3.0.9 Flask-SeaSurf==0.2.2 Flask-Migrate==3.0.1 bcrypt==3.2.0 Flask-Bcrypt==0.7.1 pathvalidate==2.3.1 markupsafe==2.0.1
Updating
rq
to version 1.10.1 didn't throw the same error as above however I am receiving a new error:crackq | DEBUG crackqueue.py:170 error_parser 2022-03-21 15:43:43,875 Parsing error message: Traceback (most recent call last): crackq | File "/usr/local/lib/python3.8/dist-packages/rq/worker.py", line 1061, in perform_job crackq | rv = job.perform() crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 821, in perform crackq | self._result = self._execute() crackq | File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 844, in _execute crackq | result = self.func(*self.args, **self.kwargs) crackq | File "/opt/crackq/build/crackq/run_hashcat.py", line 688, in hc_worker crackq | hcat = runner(hash_file=hash_file, mask=mask, crackq | File "/opt/crackq/build/crackq/run_hashcat.py", line 192, in runner crackq | raise ValueError('Aborted, speed check failed: {}'.format(err_msg)) crackq | ValueError: Aborted, speed check failed: Work-horse was terminated unexpectedly (waitpid returned 139)
Hello, I cannot get past the issue where Pypal failed to import. How did you solve it? Please describe the solution
Try disabling the brain and it might show a more detailed error message.
I'm getting the same invalid load key, '{'
exception. It seems like the job submission is confused. It's also odd that even when I disable brain in the job, I see brain=True
.
DEBUG cq_api.py:206 get_jobdetails 2023-07-22 13:17:55,939 Parsing job details:
crackq.run_hashcat.hc_worker(attack_mode=0, brain=True, hash_file='/var/crackq/logs/51f4a7faca84400296c3c0beae784d62.hashes', hash_mode=1000, increment=False, increment_max=None, increment_min=None, mask=None, mask_file=False, name='test with disable brain', outfile='/var/crackq/logs/51f4a7faca84400296c3c0beae784d62.cracked', pot_path='/var/crackq/logs/crackq.pot', potcheck=False, restore=0, rules=['/var/crackq/files/rules/OneRuleToRuleThemAll.rule'], session='51f4a7faca84400296c3c0beae784d62', username=False, wordlist2=None, wordlist='/var/crackq/files/wordlists/rockyou.txt')
DEBUG crackqueue.py:170 error_parser 2023-07-22 13:17:55,940 Parsing error message: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/rq/worker.py", line 1013, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.6/site-packages/rq/job.py", line 709, in perform
self._result = self._execute()
File "/usr/local/lib/python3.6/site-packages/rq/job.py", line 732, in _execute
result = self.func(*self.args, **self.kwargs)
File "/opt/crackq/build/crackq/run_hashcat.py", line 694, in hc_worker
benchmark=benchmark, benchmark_all=benchmark_all)
File "/opt/crackq/build/crackq/run_hashcat.py", line 192, in runner
raise ValueError('Aborted, speed check failed: {}'.format(err_msg))
ValueError: Aborted, speed check failed: invalid load key, '{'.
DEBUG crackqueue.py:176 error_parser 2023-07-22 13:17:55,940 Parsed error: invalid load key, '{'.
Regarding
python3 rq_queryqueue.py speed_check
like @adnahmed , I had to get an older version of rq for the script to run.
pip3 install "rq==1.13.0"
The error was
Traceback (most recent call last):
File "rq_queryqueue.py", line 5, in <module>
from rq import use_connection, Queue
ImportError: cannot import name 'use_connection'
Once I did get the script to run, I got the same load key error that I see in the UI.
python3 rq_queryqueue.py speed_check
Traceback (most recent call last):
File "rq_queryqueue.py", line 27, in <module>
cur_list = started.get_job_ids()
File "/usr/local/lib/python3.6/site-packages/rq/registry.py", line 143, in get_job_ids
self.cleanup()
File "/usr/local/lib/python3.6/site-packages/rq/registry.py", line 225, in cleanup
job = self.job_class.fetch(job_id, connection=self.connection, serializer=self.serializer)
File "/usr/local/lib/python3.6/site-packages/rq/job.py", line 521, in fetch
job.refresh()
File "/usr/local/lib/python3.6/site-packages/rq/job.py", line 899, in refresh
self.restore(data)
File "/usr/local/lib/python3.6/site-packages/rq/job.py", line 875, in restore
self.meta = self.serializer.loads(obj.get('meta')) if obj.get('meta') else {}
_pickle.UnpicklingError: invalid load key, '{'.
Don't bother debugging this, I've got updates to push imminently with the docker container and all python libs updated. Should be available later today, I'm just cleaning it up.
Check out the dev branch, this should all be fixed there now.
This should be fixed in master, let me know if it's still not working.
I am running from master branch with Python 3.8 on Ubuntu. Jobs will run fine for a while and then start failing. Message below. A docker compose down+up gets jobs running again.
INFO conf.py:18 hc_conf 2023-08-08 20:42:12,445 Reading from config file /var/crackq/files/crackq.conf
INFO run_hashcat.py:116 runner 2023-08-08 20:42:12,463 Running hashcat
ERROR run_hashcat.py:188 runner 2023-08-08 20:42:12,491 Speed check failed: RuntimeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/rq/worker.py", line 1418, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 1222, in perform
self._result = self._execute()
File "/usr/local/lib/python3.8/dist-packages/rq/job.py", line 1259, in _execute
result = self.func(*self.args, **self.kwargs)
File "/opt/crackq/build/crackq/run_hashcat.py", line 988, in show_speed
hcat = runner(hash_file=hash_file, mask=mask,
File "/opt/crackq/build/crackq/run_hashcat.py", line 159, in runner
hc.hashcat_session_execute()
SystemError: <method 'hashcat_session_execute' of 'pyhashcat.hashcat' objects> returned a result with an error set
Try adding this to the docker-compose file:
#runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
- capabilities: [gpu]
Note 'runtime: nvidia' is removed. I'll do some further testing when I get the chance, probably on the weekend.
With this docker-compose:
crackq:
build:
context: ./build
dockerfile: Dockerfile
image: "nvidia-ubuntu"
ports:
- "127.0.0.1:8080:8080"
depends_on:
- redis
networks:
- crackq_net
container_name: "crackq"
hostname: "crackq"
volumes:
- /var/crackq/:/var/crackq
- ./crackq:/opt/crackq/build/crackq/
stdin_open: true
# runtime: nvidia
# Add "deploy" per https://github.com/f0cker/crackq/issues/33
deploy:
resources:
reservations:
devices:
- driver: nvidia
- capabilities: [gpu]
user: crackq
tty: true
environment:
PYTHONPATH: "/opt/crackq/build/"
MAIL_USERNAME: ${MAIL_USERNAME}
MAIL_PASSWORD: ${MAIL_PASSWORD}
I get this error starting:
[+] Running 3/4
✔ Network crackq_crackq_net Created 0.1s
✔ Container redis Started 0.4s
⠿ Container crackq Starting 0.6s
✔ Container nginx Created 0.0s
Error response from daemon: could not select device driver "nvidia" with capabilities: [[]]
is this with the nvidia devel image (nvidia/cuda:12.2.0-devel-ubuntu20.04)?
is this with the nvidia devel image (nvidia/cuda:12.2.0-devel-ubuntu20.04)?
This happened with both devel and runtime images
It took over a week of running jobs through the system, but the <method 'hashcat_session_execute' of 'pyhashcat.hashcat' objects> returned a result with an error set
exceptions came back. As before, a down+up of the containers brought it back online.
Failed again today with the same <method 'hashcat_session_execute' of 'pyhashcat.hashcat' objects> returned a result with an error set
. This time, I ran a small test before I restarted the containers:
# works in host - GPUs recognized
nvidia-smi
# fails in docker
sudo docker exec -it crackq nvidia-smi
Failed to initialize NVML: Unknown Error
After restarting the containers the GPU is visible again.
sudo docker exec -it crackq nvidia-smi
Wed Aug 23 18:41:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 70C P0 32W / 70W | 2MiB / 15360MiB | 0% Default |
|
Obviously this isn't a CrackQ issue, but is there any way in the API we can monitor for GPU "health"? That would be cleaner than waiting for jobs to fail.
Regarding the loss of GPU visibility in the container, 'Failed to initialize NVML: Unknown Error`, I found these issues:
https://github.com/NVIDIA/nvidia-docker/issues/1730 https://github.com/NVIDIA/nvidia-docker/issues/1671
I will stop posting in this issue as it seem totally unrelated to CrackQ. Apologies for the distraction.
Check the v0.1.2 branch out, I believe this should be fixed now. I had no issues testing on some ec2 test boxes. I'll close this off when I merge into master if I don't hear anything, but feel free to reopen
Closing as above
Prerequisites
Enable debugging: sudo docker exec -it crackq /bin/sed -i 's/INFO/DEBUG/g' /opt/crackq/build/crackq/log_config.ini Unable to do as the crackq container fails to start.
Prior to reaching this point I also had to tweak the Nvidia + Ubuntu Dockerfile as Python3.7 and Python3.7-Dev are not available on Ubuntu 20.04 and would throw an error when running
./install.sh /docker/nvidia/ubuntu
. I changed these to Python3.8 and Python3.8-Dev which fixed the issue.I needed to uncomment
ENV DEBIAN_FRONTEND noninteractive
otherwiseinstall.sh
would hang on setting up a timezone for tzdata.I also needed to change
FROM nvidia/cuda:runtime-ubuntu20.04
to specify a version number. Using the most recent, I went withFROM nvidia/cuda:11.6.0-runtime-ubuntu20.04
.Describe the bug
Running the
sudo docker-compose -f docker-compose.nvidia.yml up --build
command, Crackq is unable to start. The error thrown is after a long traceback through python imports is the following:ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/usr/local/lib/python3.8/dist-packages/markupsage/__init__.py
See picture below for the entire traceback.
To Reproduce
Steps to reproduce the behavior:
1. Pull the Crackq Repo
2. Following the readme, install the latest Docker and Docker-compose
3. Download the latest Nvidia Server Drivers.
This may vary depending on GPU's. I'm using 7 ZOTAC 1080ti's and I installed the recommended drivers shown by running the following.
4. Install Nvidia Docker
This part took some trial and error as the Crackq readme says the following:
However per the Nvidia-Docker Docs, they recommend installing
nvidia-docker2
which resulted in some problems for me. Instead I followed the Crackq readme and did the following after installingnvidia-container-runtime
per Nvidia-Container-Runtime.5. Run Install.sh
6. Configuration
I ran through the configuration portion. Did everything documented in the Crackq Configuration
I skipped any type of authentication setup and modified the custom nginx config and added my own certs to the proper directory
Expected behavior
After all this I should be able to run the application with either:
sudo docker-compose -f docker-compose.nvidia.yml up --build
Or
sudo docker-compose -f docker-compose.nvidia.yml up -d
Debug output
This is the output I receive from the docker logs as the containers were starting:
Additional context
To add, I was able to temporarily workaround this by modifying the Nvidia/Ubuntu/Dockerfile again and including this command
RUN pip3 install markupsafe==2.0.1
per this issue forum.However this then led to an issue where Pypal also failed to import. Unfortunately I don't have the output from the docker logs, however the logs threw a similar traceback stack as the one above listing that it was unable to locate pypal.
If I missed anything please let me know.