galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

Datatype classes from TS take precedence (was uploads hanging (one in a few 1000)) #11915

Closed bgruening closed 2 years ago

bgruening commented 3 years ago

Once in a few thousands of uploads (fastq.gz, covid), I didn't found yet an explanation why, the sniffing of datatypes ends up in an endless loop as it seems.

grafik

How reliable are those line numbers @mvdbeek?

mvdbeek commented 3 years ago

I don't see any sign of a loop? If this is py-spy top that's just showing you cumulative time spent per line, you can't read this as a stack.

bgruening commented 3 years ago

Its py-spy and its staying int this proteomics lines for minutes/hours

mvdbeek commented 3 years ago

I guess my first advice would be to set the datatype, then you don't need to run through the sniffer. It's not immediately clear why that particular sniffer would use that much CPU time. Can you dump the thread with py-spy --dump <pid> ?

bgruening commented 3 years ago

Stacktrace is below:

> py-spy dump -p 415441
Process 415441: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/dnb03/galaxy_db/job_working_directory/019/128/19128078/registry.xml /data/dnb03/galaxy_db/job_working_directory/019/128/19128078/upload_params.json 39297763:/data/dnb03/galaxy_db/job_working_directory/019/128/19128078/working/dataset_0d29d838-cf1d-492a-bd5c-e69dc2c82a82_files:/data/dnb03/galaxy_db/job_working_directory/019/128/19128078/outputs/galaxy_dataset_0d29d838-cf1d-492a-bd5c-e69dc2c82a82.dat 
Python v3.6.8 (/opt/galaxy/venv/bin/python)

Thread 415441 (idle): "MainThread"
    sniff (proteomics.py:369)
    run_sniffers_raw (galaxy/datatypes/sniff.py:523)
    guess_ext (galaxy/datatypes/sniff.py:463)
    handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:761)
    handle_upload (galaxy/datatypes/upload_util.py:54)
    add_file (upload.py:143)
    __main__ (upload.py:332)
    <module> (upload.py:339)
innovate-invent commented 3 years ago

Possibly related? https://github.com/galaxyproject/galaxy/issues/11335

mvdbeek commented 3 years ago

No, that's different. The dump here is from the actual upload job, downstream of what is going on in https://github.com/galaxyproject/galaxy/issues/11335

gmauro commented 2 years ago

We have more than 150 upload jobs stuck in this way!

root@vgcnbwc-worker-c120m215-1868:~$ /usr/local/bin/py-spy dump --pid 3832036
Process 3832036: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/jwd/main/034/326/34326382/registry.xml /data/jwd/main/034/326/34326382/upload_params.json 64101192:/data/jwd/main/034/326/34326382/working/dataset_b2441e82-cd1f-49a6-8824-63af197f6f80_files:/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat
Python v3.6.8 (/opt/galaxy/venv/bin/python)

Thread 0x14DA9BD84B80 (active): "MainThread"
    sniff (proteomics.py:369)
    run_sniffers_raw (galaxy/datatypes/sniff.py:529)
    guess_ext (galaxy/datatypes/sniff.py:469)
    handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:767)
    handle_upload (galaxy/datatypes/upload_util.py:54)
    add_file (tools/data_source/upload.py:144)
    __main__ (tools/data_source/upload.py:334)
    <module> (tools/data_source/upload.py:341)
root@vgcnbwc-worker-c120m215-1868:~$ strace -p 3832036
...
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
read(3, "", 8192)                       = 0
...

/opt/galaxy/server/lib/galaxy/tools/data_fetch.py or /opt/galaxy/server/tools/data_source/upload.py same behavior

gmauro commented 2 years ago

We have these errors in the Galaxy log:

journalctl -u galaxy-zergling@1 --since '3 hours ago' --no-pager | grep proteomics

Nov 18 07:17:09 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:09,906 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Stockholm_1_0'
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,109 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpeff
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,110 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpeff
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,208 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.snpsift_dbnsfp
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,399 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.infernal
Nov 18 07:17:10 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:10,400 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Infernal_CM_1_1'
Nov 18 07:17:18 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:18,571 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.infernal
Nov 18 07:17:18 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:18,572 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype class for 'galaxy.datatypes.infernal:Infernal_CM_1_1'
Nov 18 07:17:42 sn06.galaxyproject.eu uwsgi[3150762]: galaxy.datatypes.registry ERROR 2021-11-18 07:17:42,658 [pN:main,p:3150762,w:0,m:0,tN:MainThread] Error importing datatype module galaxy.datatypes.gafa_datatypes

and in the registry.xml in the working directory, they are present, e.g. <sniffer type="galaxy.datatypes.infernal:Infernal_CM_1_1"/>

gmauro commented 2 years ago

@mvdbeek could be an issue importing datatypes?

mvdbeek commented 2 years ago

Can you run with py-spy dump with -l ?

gmauro commented 2 years ago
root@vgcnbwc-worker-c120m215-1868:~$ /usr/local/bin/py-spy dump -l --pid 3832036
Process 3832036: python /opt/galaxy/server/tools/data_source/upload.py /opt/galaxy/server /data/jwd/main/034/326/34326382/registry.xml /data/jwd/main/034/326/34326382/upload_params.json 64101192:/data/jwd/main/034/326/34326382/working/dataset_b2441e82-cd1f-49a6-8824-63af197f6f80_files:/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat
Python v3.6.8 (/opt/galaxy/venv/bin/python)

Thread 0x14DA9BD84B80 (active): "MainThread"
    sniff (proteomics.py:369)
        Arguments:
            self: <Ms2 at 0x14da734d4b00>
            filename: "/data/jwd/nginx_upload/main/uploads/0000088938"
        Locals:
            contents: <_io.TextIOWrapper at 0x14da73488048>
            header_lines: []
            line: ""
    run_sniffers_raw (galaxy/datatypes/sniff.py:529)
        Arguments:
            filename_or_file_prefix: <FilePrefix at 0x14da739372e8>
            sniff_order: [<SDF at 0x14da737a6208>, <PDB at 0x14da7374b860>, <MOL2 at 0x14da7374b8d0>, <InChI at 0x14da7374b940>, ...]
            is_binary: False
        Locals:
            fname: "/data/jwd/nginx_upload/main/uploads/0000088938"
            file_prefix: <FilePrefix at 0x14da739372e8>
            file_ext: None
            datatype: <Ms2 at 0x14da734d4b00>
            datatype_compressed: False
    guess_ext (galaxy/datatypes/sniff.py:469)
        Arguments:
            fname: "/data/jwd/nginx_upload/main/uploads/0000088938"
            sniff_order: [<SDF at 0x14da737a6208>, <PDB at 0x14da7374b860>, <MOL2 at 0x14da7374b8d0>, <InChI at 0x14da7374b940>, ...]
            is_binary: False
        Locals:
            file_prefix: <FilePrefix at 0x14da739372e8>
    handle_uploaded_dataset_file_internal (galaxy/datatypes/sniff.py:767)
        Arguments:
            filename: "/data/jwd/nginx_upload/main/uploads/0000088938"
            datatypes_registry: <Registry at 0x14da9bcc1978>
            ext: "auto"
            tmp_prefix: "data_id_64101192_upload_"
            tmp_dir: "/data/jwd/main/034/326/34326382/outputs"
            in_place: True
            check_content: True
            is_binary: False
            auto_decompress: True
            uploaded_file_ext: "fastq"
            convert_to_posix_lines: True
            convert_spaces_to_tabs: False
        Locals:
            is_valid: True
            converted_path: "/data/jwd/nginx_upload/main/uploads/0000088938"
            compressed_type: None
            guessed_ext: "auto"
    handle_upload (galaxy/datatypes/upload_util.py:54)
        Arguments:
            registry: <Registry at 0x14da9bcc1978>
            path: "/data/jwd/nginx_upload/main/uploads/0000088938"
            requested_ext: "auto"
            name: "fastq_runid_3ddc21791e264782a02e7d00de162c7802aaaa01_48_0.fastq"
            tmp_prefix: "data_id_64101192_upload_"
            tmp_dir: "/data/jwd/main/034/326/34326382/outputs"
            check_content: True
            link_data_only: False
            in_place: True
            auto_decompress: True
            convert_to_posix_lines: True
            convert_spaces_to_tabs: False
        Locals:
            stdout: None
            converted_path: None
            multi_file_zip: False
            is_binary: False
    add_file (tools/data_source/upload.py:144)
        Arguments:
            dataset: <Bunch at 0x14da7374b6a0>
            registry: <Registry at 0x14da9bcc1978>
            output_path: "/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat"
        Locals:
            ext: None
            compression_type: None
            line_count: None
            link_data_only_str: "copy_files"
            link_data_only: False
            run_as_real_user: False
            purge_source: True
            in_place: True
            check_content: True
            auto_decompress: True
    __main__ (tools/data_source/upload.py:334)
        Locals:
            output_paths: {64101192: ("/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat", ...)}
            registry: <Registry at 0x14da9bcc1978>
            datasets: [{"file_type": "auto", "ext": "txt", "name": "fastq_runid_3ddc21791e264782a02e7d00de162c7802aaaa01_48_0.fastq", ...}]
            metadata: []
            dataset: <Bunch at 0x14da7374b6a0>
            output_path: "/data/jwd/main/034/326/34326382/outputs/galaxy_dataset_b2441e82-cd1f-49a6-8824-63af197f6f80.dat"
    <module> (tools/data_source/upload.py:341)
mvdbeek commented 2 years ago

~wonderful, that while True: is at fault https://github.com/mvdbeek/galaxy/blob/61be2b5daf67943d701811a0e52f6e2daa1a9a71/lib/galaxy/datatypes/proteomics.py#L932. I'm looking at a fix.~

gmauro commented 2 years ago

Thank you!

mvdbeek commented 2 years ago

Can you also check if you have another proteomics.py datatype file anywhere ? This could possibly come from tool shed repositories. Somehow the line numbers are quite off

mvdbeek commented 2 years ago

Ah, this is coming from proteomics_datatypes. If you uninstall those from disk that should fix it. We also have some logic that should prevent loading TS datatypes if a Galaxy datatype exists, but obviously that's not working quite right.

gmauro commented 2 years ago

https://toolshed.g2.bx.psu.edu/repository?repository_id=6fba172fc57ea523&changeset_revision=300fc3aa6954

gmauro commented 2 years ago

@bgruening removed all TS datatypes

mvdbeek commented 2 years ago

Suggestions: Disable TS datatype completely in config option

mvdbeek commented 2 years ago

We're going with https://github.com/galaxyproject/galaxy/pull/13250