Closed bgruening closed 2 years ago
I don't see any sign of a loop? If this is py-spy top that's just showing you cumulative time spent per line, you can't read this as a stack.
Its py-spy and its staying int this proteomics lines for minutes/hours
I guess my first advice would be to set the datatype, then you don't need to run through the sniffer. It's not immediately clear why that particular sniffer would use that much CPU time. Can you dump the thread with py-spy --dump <pid>
?
Stacktrace is below:
> py-spy dump -p 415441
Process 415441: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/dnb03/galaxy_db/job_working_directory/019/128/19128078/registry.xml /data/dnb03/galaxy_db/job_working_directory/019/128/19128078/upload_params.json 39297763:/data/dnb03/galaxy_db/job_working_directory/019/128/19128078/working/dataset_0d29d838-cf1d-492a-bd5c-e69dc2c82a82_files:/data/dnb03/galaxy_db/job_working_directory/019/128/19128078/outputs/galaxy_dataset_0d29d838-cf1d-492a-bd5c-e69dc2c82a82.dat
Python v3.6.8 (/opt/galaxy/venv/bin/python)
Thread 415441 (idle): "MainThread"
sniff (proteomics.py:369)
run_sniffers_raw (galaxy/datatypes/sniff.py:523)
guess_ext (galaxy/datatypes/sniff.py:463)
handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:761)
handle_upload (galaxy/datatypes/upload_util.py:54)
add_file (upload.py:143)
__main__ (upload.py:332)
<module> (upload.py:339)
Possibly related? https://github.com/galaxyproject/galaxy/issues/11335
No, that's different. The dump here is from the actual upload job, downstream of what is going on in https://github.com/galaxyproject/galaxy/issues/11335
We have more than 150 upload jobs stuck in this way!
root@vgcnbwc-worker-c120m215-1868:~$ /usr/local/bin/py-spy dump --pid 3832036
Process 3832036: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/jwd/main/034/326/34326382/registry.xml /data/jwd/main/034/326/34326382/upload_params.json 64101192:/data/jwd/main/034/326/34326382/working/dataset_b2441e82-cd1f-49a6-8824-63af197f6f80_files:/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat
Python v3.6.8 (/opt/galaxy/venv/bin/python)
Thread 0x14DA9BD84B80 (active): "MainThread"
sniff (proteomics.py:369)
run_sniffers_raw (galaxy/datatypes/sniff.py:529)
guess_ext (galaxy/datatypes/sniff.py:469)
handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:767)
handle_upload (galaxy/datatypes/upload_util.py:54)
add_file (tools/data_source/upload.py:144)
__main__ (tools/data_source/upload.py:334)
<module> (tools/data_source/upload.py:341)
root@vgcnbwc-worker-c120m215-1868:~$ strace -p 3832036
...
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
read(3, "", 8192) = 0
...
/opt/galaxy/server/lib/galaxy/tools/data_fetch.py
or /opt/galaxy/server/tools/data_source/upload.py
same behavior
We have these errors in the Galaxy log:
journalctl -u galaxy-zergling@1 --since '3 hours ago' --no-pager | grep proteomics
Nov 18 07:17:09 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:09,906 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Stockholm_1_0'
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,109 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpeff
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,110 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpeff
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,208 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpsift_dbnsfp
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,399 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.infernal
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,400 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Infernal_CM_1_1'
Nov 18 07:17:18 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:18,571 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.infernal
Nov 18 07:17:18 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:18,572 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Infernal_CM_1_1'
Nov 18 07:17:42 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:42,658 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.gafa_datatypes
and in the registry.xml
in the working directory, they are present, e.g.
<sniffer type="galaxy.datatypes.infernal:Infernal_CM_1_1"/>
@mvdbeek could be an issue importing datatypes?
Can you run with py-spy dump with -l
?
root@vgcnbwc-worker-c120m215-1868:~$ /usr/local/bin/py-spy dump -l --pid 3832036
Process 3832036: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/jwd/main/034/326/34326382/registry.xml /data/jwd/main/034/326/34326382/upload_params.json 64101192:/data/jwd/main/034/326/34326382/working/dataset_b2441e82-cd1f-49a6-8824-63af197f6f80_files:/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat
Python v3.6.8 (/opt/galaxy/venv/bin/python)
Thread 0x14DA9BD84B80 (active): "MainThread"
sniff (proteomics.py:369)
Arguments:
self: <Ms2 at 0x14da734d4b00>
filename: "/data/jwd/nginx_upload/main/uploads/0000088938"
Locals:
contents: <_io.TextIOWrapper at 0x14da73488048>
header_lines: []
line: ""
run_sniffers_raw (galaxy/datatypes/sniff.py:529)
Arguments:
filename_or_file_prefix: <FilePrefix at 0x14da739372e8>
sniff_order: [<SDF at 0x14da737a6208>, <PDB at 0x14da7374b860>, <MOL2 at 0x14da7374b8d0>, <InChI at 0x14da7374b940>, ...]
is_binary: False
Locals:
fname: "/data/jwd/nginx_upload/main/uploads/0000088938"
file_prefix: <FilePrefix at 0x14da739372e8>
file_ext: None
datatype: <Ms2 at 0x14da734d4b00>
datatype_compressed: False
guess_ext (galaxy/datatypes/sniff.py:469)
Arguments:
fname: "/data/jwd/nginx_upload/main/uploads/0000088938"
sniff_order: [<SDF at 0x14da737a6208>, <PDB at 0x14da7374b860>, <MOL2 at 0x14da7374b8d0>, <InChI at 0x14da7374b940>, ...]
is_binary: False
Locals:
file_prefix: <FilePrefix at 0x14da739372e8>
handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:767)
Arguments:
filename: "/data/jwd/nginx_upload/main/uploads/0000088938"
datatypes_registry: <Registry at 0x14da9bcc1978>
ext: "auto"
tmp_prefix: "data_id_64101192_upload_"
tmp_dir: "/data/jwd/main/034/326/34326382/outputs"
in_place: True
check_content: True
is_binary: False
auto_decompress: True
uploaded_file_ext: "fastq"
convert_to_posix_lines: True
convert_spaces_to_tabs: False
Locals:
is_valid: True
converted_path: "/data/jwd/nginx_upload/main/uploads/0000088938"
compressed_type: None
guessed_ext: "auto"
handle_upload (galaxy/datatypes/upload_util.py:54)
Arguments:
registry: <Registry at 0x14da9bcc1978>
path: "/data/jwd/nginx_upload/main/uploads/0000088938"
requested_ext: "auto"
name: "fastq_runid_3ddc21791e264782a02e7d00de162c7802aaaa01_48_0.fastq"
tmp_prefix: "data_id_64101192_upload_"
tmp_dir: "/data/jwd/main/034/326/34326382/outputs"
check_content: True
link_data_only: False
in_place: True
auto_decompress: True
convert_to_posix_lines: True
convert_spaces_to_tabs: False
Locals:
stdout: None
converted_path: None
multi_file_zip: False
is_binary: False
add_file (tools/data_source/upload.py:144)
Arguments:
dataset: <Bunch at 0x14da7374b6a0>
registry: <Registry at 0x14da9bcc1978>
output_path: "/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat"
Locals:
ext: None
compression_type: None
line_count: None
link_data_only_str: "copy_files"
link_data_only: False
run_as_real_user: False
purge_source: True
in_place: True
check_content: True
auto_decompress: True
__main__ (tools/data_source/upload.py:334)
Locals:
output_paths: {64101192: ("/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat", ...)}
registry: <Registry at 0x14da9bcc1978>
datasets: [{"file_type": "auto", "ext": "txt", "name": "fastq_runid_3ddc21791e264782a02e7d00de162c7802aaaa01_48_0.fastq", ...}]
metadata: []
dataset: <Bunch at 0x14da7374b6a0>
output_path: "/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat"
<module> (tools/data_source/upload.py:341)
~wonderful, that while True:
is at fault https://github.com/mvdbeek/galaxy/blob/61be2b5daf67943d701811a0e52f6e2daa1a9a71/lib/galaxy/datatypes/proteomics.py#L932. I'm looking at a fix.~
Thank you!
Can you also check if you have another proteomics.py datatype file anywhere ? This could possibly come from tool shed repositories. Somehow the line numbers are quite off
Ah, this is coming from proteomics_datatypes. If you uninstall those from disk that should fix it. We also have some logic that should prevent loading TS datatypes if a Galaxy datatype exists, but obviously that's not working quite right.
@bgruening removed all TS datatypes
Suggestions: Disable TS datatype completely in config option
We're going with https://github.com/galaxyproject/galaxy/pull/13250
Once in a few thousands of uploads (fastq.gz, covid), I didn't found yet an explanation why, the sniffing of datatypes ends up in an endless loop as it seems.
How reliable are those line numbers @mvdbeek?