LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
167 stars 31 forks source link

Guppy Server Connection Error: Timeout on LOAD_CONFIG Request #190

Closed cognoescere closed 1 year ago

cognoescere commented 2 years ago

Our team has been having trouble getting past a connection issue with the Guppy basecalling server. To outline what we're trying to do, we built up the mmi file using a fasta file with minimap2. We changed the fasta headings to all be the same target, and configured ReadFish to sequence only the reads that match the mmi file. In this way, we're attempting to capture a specific group of bacteria.

Before building the mmi file, all fasta headers were set to:

gammaproteobacteria

The commands we used to start the server are as follows:

readfish targets --device MN28389 --experiment-name "readfish" --toml readfish_test.toml --log-file readfish_test.log

sudo /home/ricker/install/guppy/ont-guppy/bin/guppy_basecall_server --config /home/ricker/install/guppy/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg --log_path /home/ --port 5555

The error message we get from read fish is:

2022-04-04 16:51:12,278 ru.ru_gen batch_size=512
2022-04-04 16:51:12,278 ru.ru_gen cache_size=512
2022-04-04 16:51:12,278 ru.ru_gen channels=[1, 512]
2022-04-04 16:51:12,278 ru.ru_gen chunk_log=None
2022-04-04 16:51:12,278 ru.ru_gen command=targets
2022-04-04 16:51:12,278 ru.ru_gen device=MN28389
2022-04-04 16:51:12,278 ru.ru_gen dry_run=False
2022-04-04 16:51:12,278 ru.ru_gen experiment_name=rf9
2022-04-04 16:51:12,278 ru.ru_gen func=<function run at 0x7f6d86c94940>
2022-04-04 16:51:12,278 ru.ru_gen host=127.0.0.1
2022-04-04 16:51:12,278 ru.ru_gen log_file=readfish_test.log
2022-04-04 16:51:12,278 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2022-04-04 16:51:12,278 ru.ru_gen log_level=info
2022-04-04 16:51:12,278 ru.ru_gen paf_log=None
2022-04-04 16:51:12,278 ru.ru_gen port=None
2022-04-04 16:51:12,278 ru.ru_gen run_time=172800
2022-04-04 16:51:12,278 ru.ru_gen throttle=0.4
2022-04-04 16:51:12,278 ru.ru_gen toml=readfish_test.toml
2022-04-04 16:51:12,278 ru.ru_gen unblock_duration=0.1
2022-04-04 16:51:12,278 ru.ru_gen workers=1
2022-04-04 16:51:12,282 ru.ru_gen Initialising minimap2 mapper
2022-04-04 16:51:18,031 ru.ru_gen Mapper initialised
2022-04-04 16:51:18,071 ru.ru_gen This experiment has 1 region on the flowcell
2022-04-04 16:51:18,072 ru.ru_gen Using reference: /home/ricker/ref.mmi
2022-04-04 16:51:19,846 ru.ru_gen Region 'ref' (control=False) has 1 contig of which 1 are in the reference. There are 2 targets (including +/- strand) representing 100.0% of the reference. Reads will be unblocked when classed as single_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as multi_off, no_map or no_seq.
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG
Traceback (most recent call last):
  File "/home/ricker/readfish/bin/readfish", line 11, in <module>
    load_entry_point('readfish==0.0.9a3', 'console_scripts', 'readfish')()
  File "/home/ricker/readfish/lib/python3.8/site-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/home/ricker/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 498, in run
    simple_analysis(
  File "/home/ricker/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 155, in simple_analysis
    caller = Caller(
  File "/home/ricker/readfish/lib/python3.8/site-packages/ru/basecall.py", line 41, in __init__
    self.connect()
  File "/home/ricker/readfish/lib/python3.8/site-packages/pyguppy_client_lib/pyclient.py", line 172, in connect
    raise exception_type(exception_message)
ConnectionError: Query timed out: <result.timed_out: 4>

Our installed MinKNOW is 22.03.5 MinKNOW core: 5.0.0 Guppy: 6.0.6 Bream: 7.0.9 Script Configuration: 5.0.8

We'd greatly appreciate any help in solving this issue. Thank you for your time!

alexomics commented 2 years ago

Could you share the caller_settings section of the readfish_test.toml? Guppy 6 uses an IPC socket so I think the issue is there. You should be using the scheme as in this comment: https://github.com/LooseLab/readfish/issues/170#issuecomment-1026814330

cognoescere commented 2 years ago

Hello, Thanks for your quick response to the issue!

Here is our readfish_test.toml file:

[caller_settings]
config_name = "dna_r9.4.1_450bps_hac"
host = "127.0.0.1"
port = 5555

[conditions]
reference = "/home/ricker/ref.mmi"

[conditions.0]
name = "ref"
control = false
min_chunks = 0
max_chunks = inf
targets = ["nucleotideseq.fa"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

It looks like our .toml file isn't matching the scheme seen in #170 that you pointed out. We'll try changing the host to match #170 and see how that goes.

I'll reply here if it returns a similar error again or works. Thanks!

alexomics commented 2 years ago

By the look of the caller_settings that's likely the issue.

Also, looking at the rest of the TOML file, I just want to ask what's in the nucleotideseq.fa? As targets should be a list of contig names that are in the reference index or comma-separated values of contig,start,stop,strand. See the target types and formats section of this guide

cognoescere commented 2 years ago

Apologies, the .toml I sent previously was an older version we used with guppy 5. Here is the updated one we've been using:

[caller_settings]
config_name = "dna_r9.4.1_450bps_hac"
host = "ipc:///tmp/.guppy"
port = 5555

[conditions]
reference = "/home/ricker/ref.mmi"

[conditions.0]
name = "ref"
control = false
min_chunks = 0
max_chunks = inf
targets = ["gammaproteobacteria"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "proceed"
no_seq = "proceed"
no_map = "proceed"

You can see that we were using the host pointed out in #170, and that targets was changed to match the string seen in the reference.

alexomics commented 2 years ago

When you run the guppy command to start the server (guppy_basecall_server ...) does this tell you the server address?

cognoescere commented 2 years ago

Following up from last night, we ran this command to start the server: sudo /home/ricker/install/guppy/ont-guppy/bin/guppy_basecall_server --config /home/ricker/install/guppy/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg --log_path /home/ --port 5555 It returned the server address: Starting server on port: ipc:///home/ricker/5555 Does that look right?

alexomics commented 2 years ago

Since you are running your own server you will need to use the following config as the one you've been using is for the MinKNOW controlled Guppy server.

[caller_settings]
config_name = "dna_r9.4.1_450bps_hac"
host = "ipc:///home/ricker"
port = 5555
cognoescere commented 2 years ago

I've updated the .toml file with the new config, but it looks like we're still running into the same issue. Here is a screenshot of the two programs running:

Screenshot from 2022-04-06 12-21-37

Do you have any other suggestions? Thanks again for the quick replies!

alexomics commented 2 years ago

Are you able to start the server as your normal user, without sudo?

cognoescere commented 2 years ago

Nope, it won't run without sudo:

terminate called after throwing an instance of 'boost::wrapexcept' what(): Failed to open file for writing: Input/output error: "/home/guppy_basecall_server_log-2022-04-06_12-34-06.log" Aborted (core dumped)

cognoescere commented 2 years ago

Oh, hang on, it will run if I change the log file output path:

/home/ricker/install/guppy/ont-guppy/bin/guppy_basecall_server --config /home/ricker/install/guppy/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg --log_path /home/ricker --port 5555

but it is still giving the same output when running readfish targets

readfish targets --device MN28389 --experiment-name "rf10" --toml readfish_test1.toml --log-file readfish_test1.log 2022-04-06 12:39:36,923 ru.ru_gen /home/ricker/readfish/bin/readfish targets --device MN28389 --experiment-name rf10 --toml readfish_test1.toml --log-file readfish_test1.log 2022-04-06 12:39:36,923 ru.ru_gen batch_size=512 2022-04-06 12:39:36,923 ru.ru_gen cache_size=512 2022-04-06 12:39:36,923 ru.ru_gen channels=[1, 512] 2022-04-06 12:39:36,923 ru.ru_gen chunk_log=None 2022-04-06 12:39:36,923 ru.ru_gen command=targets 2022-04-06 12:39:36,923 ru.ru_gen device=MN28389 2022-04-06 12:39:36,923 ru.ru_gen dry_run=False 2022-04-06 12:39:36,923 ru.ru_gen experiment_name=rf10 2022-04-06 12:39:36,923 ru.ru_gen func=<function run at 0x7f1806591940> 2022-04-06 12:39:36,923 ru.ru_gen host=127.0.0.1 2022-04-06 12:39:36,923 ru.ru_gen log_file=readfish_test1.log 2022-04-06 12:39:36,923 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s 2022-04-06 12:39:36,924 ru.ru_gen log_level=info 2022-04-06 12:39:36,924 ru.ru_gen paf_log=None 2022-04-06 12:39:36,924 ru.ru_gen port=None 2022-04-06 12:39:36,924 ru.ru_gen run_time=172800 2022-04-06 12:39:36,924 ru.ru_gen throttle=0.4 2022-04-06 12:39:36,924 ru.ru_gen toml=readfish_test1.toml 2022-04-06 12:39:36,924 ru.ru_gen unblock_duration=0.1 2022-04-06 12:39:36,924 ru.ru_gen workers=1 2022-04-06 12:39:36,927 ru.ru_gen Initialising minimap2 mapper 2022-04-06 12:39:37,254 ru.ru_gen Mapper initialised 2022-04-06 12:39:37,294 ru.ru_gen This experiment has 1 region on the flowcell 2022-04-06 12:39:37,294 ru.ru_gen Using reference: /home/ricker/ref.mmi 2022-04-06 12:39:37,295 ru.ru_gen Region 'ref' (control=False) has 1 contig of which 1 are in the reference. There are 2 targets (including +/- strand) representing 100.0% of the reference. Reads will be unblocked when classed as single_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as multi_off, no_map or no_seq.

alexomics commented 2 years ago

Is there an error when running with the corrected --log_path? Can you check/post the log here?

alexomics commented 2 years ago

With the readfish environment activated can you run the snippet from this comment https://github.com/LooseLab/readfish/issues/170#issuecomment-1032566729 changing the port and address to match the socket location for your guppy instance?

cognoescere commented 2 years ago

Here is the output after running that:

PyGuppyClient(address='ipc:///home/ricker/5555', config='dna_r9.4.1_450bps_fast', align_ref=None, bed_file=None, barcodes=None, status.connected, )

alexomics commented 2 years ago

So you can connect to the server 🎉

I've just realised the caller_settings that I pasted above includes a ..._hac config line. You might want to (if you haven't already) change this to use the dna_r9.4.1_450bps_fast config.

cognoescere commented 2 years ago

Ok, so that should work now with a live run?

alexomics commented 2 years ago

I should think so, if you can connect to the guppy server you should be able to basecall reads. Are you using a GPU enabled Guppy server?

cognoescere commented 2 years ago

We have an RTX 2070 installed, and are hoping to use that. I've included the -x auto command, but am getting the following:

/home/ricker/install/guppy/ont-guppy/bin/guppy_basecall_server --config /home/ricker/install/guppy/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg --log_path /home/ricker --port 5555 -x auto
ONT Guppy basecall server software version 6.0.6+8a98bbc, client-server API version 10.1.0
log path:            /home/ricker
chunk size:          2000
chunks per runner:   160
max queued reads:    2000
num basecallers:     4
num socket threads:  2
max returned events: 50000
gpu device:          auto
kernel path:         
runners per device:  8
[guppy/warning] barcoding::BarcodeGPU::SequenceMemorySetDef::SequenceMemorySetDef: CUDA device 0 will run out of memory while trying to allocate 284079744 bytes
[guppy/error] run_server: Maximum GPU capacity exceeded. Try decreasing max(front_window_size, rear_window_size).. Error initialising basecall server using port: ipc://5555. Aborting.
The basecall server has shut down successfully.

Running nvidia-smi shows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P8    10W / 175W |   7611MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       819      C   ...bin/guppy_basecall_server     7300MiB |
|    0   N/A  N/A      1261      G   /usr/lib/xorg/Xorg                194MiB |
|    0   N/A  N/A      2572      G   ...AAAAAAAAA= --shared-files       17MiB |
|    0   N/A  N/A     35728      G   /usr/bin/gnome-shell               44MiB |
|    0   N/A  N/A    220664      G   ...838643742896889817,131072       49MiB |
+-----------------------------------------------------------------------------+
alexomics commented 2 years ago

Hmm, maybe try the command:

/home/ricker/install/guppy/ont-guppy/bin/guppy_basecall_server --config /home/ricker/install/guppy/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg --log_path /home/ricker --port 5555 --num_callers 2 --ipc_threads 2 --device cuda:all

If you can't get guppy to stay alive, you might need to ask about that one on the ONT community forum.

cognoescere commented 2 years ago

Ok, will do. Thanks very much for all of your help with this!

Adoni5 commented 1 year ago

Closing as somewhat Inactive! @cognoescere I'm assuming you got this working!