LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
169 stars 34 forks source link

Multiplex bulk fast5 simulation on Promethion #247

Closed lkwhite closed 1 year ago

lkwhite commented 1 year ago

Thanks for providing all this documentation of running simulated sequencing runs using bulk fast5s. I'm currently trying to run simulations on a Promethion 24, which can generate and basecall many runs in parallel. Wondering if there's a tested way to specify multiple runs for simulation in a .toml file. Have you played around with this at all? And/or do you know if it's safe to run additional sequencing runs (as long as they are using different config files) while one is running a simulation at another position?

mattloose commented 1 year ago

You can create multiple toml files with different fast5 files and choose them when you start. You can also run the same simulation toml across many different flowcells.

You can run real sequencing alongside playback if you wish as well. But don't get confused with what is running on which position!

DanteV19 commented 1 month ago

Thanks to your clear documentation, I could run playback with your provided MinION R10 bulk file. I am working within a docker container, so I am using commands to run playback instead of the Minknow UI. I am using the scripts from simION to run playback without sudo or GUI.

I would have liked to do playback using your PromethION bulk file too (PC24B243_20220512_1516_PAK21362_3H_sequencing_run_NA12878_sheared20kb_3d5147fc.fast5).

To do that I have performed the following steps for playback.

  1. I referenced the promethION bulk file in the sequencing protocol toml (/opt/ont/minknow/conf/package/sequencing/sequencing_MIN114_DNA_e8_2_400K.toml):

simulation=[path/to/bulk]

Activated a conda env to

  1. I created a simulated promethION device instead of a MinION device:

/opt/ont/minknow/ont-python/bin/python -m minknow_api.examples.manage_simulated_devices --prom --add MS00000

  1. Activated a conda env for playback with a yml file:
name: simion
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11 #3.10
  - pip
  - pip:
    - readfish[all]==2024.2.0
    - Bottleneck~=1.3.7
    - scipy~=1.12.0
    - numba~=0.59.0
EOT
  1. And finally tried playback with the bash script:
pkill mk_manager
pkill minknow
sleep 5
pkill dorado
sleep 5

dorado_basecall_server -x cuda:all -c /opt/ont/dorado/data/dna_r10.4.1_e8.2_400bps_5khz_fast.cfg -p ipc:///tmp/.guppy/5555 -l minknow_run/logs/dorado --verbose_logs &
sleep 5 && echo "Initiated basecall server"
/opt/ont/minknow/bin/mk_manager_svc -c /opt/ont/minknow/conf --simulated-minion-devices=1 -d &
sleep 10 && echo "Initiated minknow server"

python3 code/start_protocol_mod.py --position MS00000 --kit SQK-LSK114 --pod5 --fastq --basecalling --fastq-reads-per-file 1000 --pod5-reads-per-file 1000

sleep 2
python3 code/query_minknow_run.py
python3 code/read_length_hist.py

I will run into an error:

2024-09-26 00:33:37.951929    INFO: common_process_exit_requested (host)
    name: grpcwebproxy
2024-09-26 00:33:37.954226    INFO: common_process_exit_requested (host)
    name: basecall_manager:grpcwebproxy
2024-09-26 00:33:37.954274    INFO: common_process_exit_requested (host)
    name: basecall_manager
2024-09-26 00:33:37.955468    INFO: common_process_stopped (host)
    exit_code: 15
    name: basecall_manager:grpcwebproxy
2024-09-26 00:33:37.955620    INFO: common_process_stopped (host)
    exit_code: 15
    name: grpcwebproxy
2024-09-26 00:33:37.970432    INFO: common_process_stopped (host)
    exit_code: 0
    name: basecall_manager
2024-09-26 00:33:38.973481    INFO: instance_stopped (host)
    instance: MS00000
    std_ec: (0:std::system): [0x0x7f09672b1268]: Success
2024-09-26 00:33:38.984456    INFO: mk_manager_shut_down (mk_manager)
ONT Dorado basecall server software version 7.4.13+54ebb08e0, client-server API version 20.0.0, minimap2 version 2.27-r1193
log path:            minknow_run/logs/dorado
max queued reads:    33000
num socket threads:  2
max returned events: 256000
gpu device:          cuda:0,cuda:1
Use of this software is permitted solely under the terms of the end user license agreement (EULA).
By running, copying or accessing this software, you are demonstrating your acceptance of the EULA.
The EULA may be found in /opt/ont/dorado/bin

Config loaded:
config file:               /opt/ont/dorado/data/dna_r10.4.1_e8.2_400bps_5khz_fast.cfg
model file:                /opt/ont/dorado/data/dna_r10.4.1_e8.2_400bps_fast@v4.3.0
Starting server on port: ipc:///tmp/.guppy/5555
Initiated basecall server
2024-09-26 00:33:53.026223    INFO: mk_manager_starting (mk_manager)
    hostname: wn-ga-01.spider.surfsara.nl
    system: ubuntu 20.04
            Distribution:           24.06.14 (STABLE)
            MinKNOW Core:           6.0.11
            Bream:                  8.0.12
            Protocol configuration: 6.0.15
            Dorado (build):          7.4.3+0dd05554a
            Dorado (connected):      7.4.13+54ebb08e0

2024-09-26 00:33:53.085707    INFO: network_config_description (mk_manager)
    guest_mode: local_only
    remote_connections_enabled: false
2024-09-26 00:33:53.086430    INFO: ping_flusher_network_up (mk_manager)
2024-09-26 00:33:53.088163 WARNING: failed_to_change_thread_priority (util)
    error_code: (13:std::system): [0x0x7f9a4426d268]: Permission denied
    thread_id: 140300467033856
2024-09-26 00:33:53.089486    INFO: auth_guest_mode (rpc)
    value: local_only
2024-09-26 00:33:53.107507    INFO: using_running_basecaller_server (host)
    dorado_version: 7.4.13+54ebb08e0
2024-09-26 00:33:53.110920    INFO: local_auth_token (host)
    path: /tmp/minknow-auth-token.json
2024-09-26 00:33:53.115775    INFO: common_process_started (host)
    name: grpcwebproxy
2024-09-26 00:33:53.116718    INFO: machine_description (mk_manager)
    cpu_has_avx: true
    cpu_has_sse42: true
    cpu_logical_core_count: 14
    cpu_model: AMD EPYC 7F32 8-Core Processor
    cpu_physical_core_count: 14
    memory_physical_bytes: 242433060864
2024-09-26 00:33:53.121346    INFO: common_process_started (host)
    name: basecall_manager
2024-09-26 00:33:53.128315    INFO: gpu_information (mk_manager)
    driver_version: 535.104.12
    gpu_type: NVIDIA A100-PCIE-40GB, NVIDIA A100-PCIE-40GB
2024-09-26 00:33:53.128378    INFO: sending_telemetry_message (ping)
    data: {"machine":{"arch":"x64","cpu":{"data":"100000004175746863414d44656e7469100f83000008000a0332f8fffffb8b0701000000000000004d0000007d302c0000000000000...
2024-09-26 00:33:53.128489    INFO: mk_manager_initialised (mk_manager)
    pid: 3251422
2024-09-26 00:33:53.128652    INFO: hardware_state_changed (host)
    name: MS00000
    os_identifier: 
    state: ready: firmware updating No 
2024-09-26 00:33:53.137145 WARNING: systemd_notify_socket_not_set (mk_manager)
2024-09-26 00:33:53.189637    INFO: rpc_delegate_is_listening (host)
    executable: basecall_manager
    port: 9504
    security: tls
2024-09-26 00:33:53.191846    INFO: common_process_started (host)
    name: basecall_manager:grpcwebproxy
2024-09-26 00:33:53.200771    INFO: rpc_delegate_proxy_is_listening (host)
    executable: basecall_manager
    tls_port: 9505
2024-09-26 00:33:53.306765    INFO: instance_started (host)
    grpc_secure_port: 8000
    grpcweb_tls_port: 8001
    instance: MS00000
2024-09-26 00:33:53.892511    INFO: updated_jwks_cache (host)
    filename: /tmp/minknow-jwks-cache.json
Initiated minknow server
2024-09-26 00:34:11.627002    INFO: basecaller_availability (host)
    state: AVAILABLE
sequencing/sequencing_MIN114_DNA_e8_2_400K:FLO-MIN114:SQK-LSK114:400
Starting protocol on 1 positions
Started protocol:
    run_id=3a3536c1-452c-41e0-819f-255e86b99c09
    position=MS00000
    flow_cell_id=
    user_specified_flow_cell_id=
['--base_calling=on', '--fast5=off', '--pod5=on', '--pod5_reads_per_file=1000', '--fastq=on', '--fastq_data', 'compress', '--fastq_reads_per_file=1000', '--bam=off', '--active_channel_selection=on', '--mux_scan_period=1.5']

3a3536c1-452c-41e0-819f-255e86b99c09
sequencing/sequencing_MIN114_DNA_e8_2_400K:FLO-MIN114:SQK-LSK114:400
/project/clonevo/Share/dante/minknow_run/./no_group/no_sample_id/20240926_0034_MS00000__3a3536c1
20240926_0034_MS00000__3a3536c1

PHASE_UNKNOWN
PHASE_INITIALISING
PHASE_INITIALISING
Traceback (most recent call last):
  File "/simION/code/query_minknow_run.py", line 32, in <module>
    current_info = c.protocol.get_current_protocol_run()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/simion/lib/python3.11/site-packages/minknow_api/protocol_service.py", line 603, in get_current_protocol_run
    return run_with_retry(self._stub.get_current_protocol_run,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/simion/lib/python3.11/site-packages/minknow_api/protocol_service.py", line 120, in run_with_retry
    result = MessageWrapper(method(message, timeout=timeout), unwraps=unwraps)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/simion/lib/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/simion/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.FAILED_PRECONDITION
    details = "No protocol running"
    debug_error_string = "UNKNOWN:Error received from peer ipv4:127.0.0.1:8000 {grpc_message:"No protocol running", grpc_status:9, created_time:"2024-09-26T00:34:45.883784464+02:00"}"
>

If you need more information or clarification to figure out what went wrong please let me know. Thanks in advance.

mattloose commented 1 month ago

You are attempting to play back a promethion bulk file with 3000 channels in a device that only supports 512 channels (minion).

I suspect to make this work you need to edit the toml file for a promethion device (not the minion). In addition you need to add a simulated promethion flowcell position and not a minion position as you have done here.

Hope that helps.

Adoni5 commented 1 month ago

@DanteV19 - what version of MinKNOW are you running?

If you are on later versions, you can choose to play back the bulk file in the MinKNOW UI (under advanced sequencing options), which is a fair bit simpler. As Matt said, the sequencing TOML you have edited is for a MinION flow cell (MIN114), so is unlikely to be correctly configured to play back a PromethION bulk file. There's one for PRO114 if you still want to edit a sequencing TOML. You will still need to add a simulated PromethION flow cell in MinKNOW, as you have done, however I recommend for these to use a name like PS00000, as good practice.

Rory

DanteV19 commented 1 month ago

Thanks for your quick responses, @mattloose @Adoni5.

For reproducibility purposes, I am using a singularity container instead of the MinKNOW UI and the only MinKNOW package I have installed for playback is ont-standalone-minknow-gpu-release (24.06.14~focal).

I will try to use one of the available sequencing_PRO114* toml files and a simulated promethion flowcell position by creating one with the command: /opt/ont/minknow/ont-python/bin/python -m minknow_api.examples.manage_simulated_devices --prom --add PS00000

However, before I can start, I may have a naive question about the toml file. How do I make sure that the right toml file for promethION is used or edited in the right way for promethION? Since I am not very familiar with all the configurations inside the tomls.

Or how should refer to the toml file that I want to use for playback? (like which command or configuration) Because I cannot seem to find anything on how to specify the toml file (like sequencing_PRO114_DNA_e8_2_400K.toml) on CLI to use the toml for playback and so far it seems to use sequencing_MIN114_DNA_e8_2_400K.toml by default.

Thanks in advance

DanteV19 commented 1 month ago

Apparently I have overlooked the MinION sequencing protocol being hard-coded into the scripts I use to launch playback from the terminal. I have finally fixed this with help of the author of the scripts and got playback working for PromethION bulk data.

Thanks for the feedback regardless.