LooseLab / Icarust

A fully featured MinKNOW simulator for testing read until experiments.
Mozilla Public License 2.0
17 stars 7 forks source link

"Connection refused" error when trying to run Readfish with Icarust #9

Closed mccarthyma17 closed 9 months ago

mccarthyma17 commented 9 months ago

Hello! I'm trying to run ReadFish with Icarust and I'm getting an error that I'm not sure how to fix. Icarust runs without any issues on its own, but when I try to run ReadFish using this command readfish targets --device Bantersaurus --port 10000 --toml test.toml --log-level info --experiment-name test_bacterial I get the following error:

2024-02-05 10:50:06,353 readfish command='targets'
2024-02-05 10:50:06,354 readfish debug_log=True
2024-02-05 10:50:06,354 readfish device='Bantersaurus'
2024-02-05 10:50:06,354 readfish dry_run=False
2024-02-05 10:50:06,354 readfish experiment_name='Human Enrichment'
2024-02-05 10:50:06,354 readfish host='127.0.0.1'
2024-02-05 10:50:06,354 readfish log_file='human_icarust_enrichment.log'
2024-02-05 10:50:06,354 readfish log_format='%(asctime)s %(name)s %(message)s'
2024-02-05 10:50:06,354 readfish log_level='info'
2024-02-05 10:50:06,354 readfish max_unblock_read_length_seconds=5
2024-02-05 10:50:06,354 readfish padding=0
2024-02-05 10:50:06,354 readfish port='10000'
2024-02-05 10:50:06,354 readfish throttle=0.4
2024-02-05 10:50:06,354 readfish toml='their_test.toml'
2024-02-05 10:50:06,354 readfish unblock_duration=0.1
2024-02-05 10:50:06,354 readfish wait_for_ready=60
Traceback (most recent call last):
  File "/Users/username/anaconda3/envs/readfish/bin/readfish", line 8, in <module>
    sys.exit(main())
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/readfish/_cli_base.py", line 61, in main
    raise SystemExit(args.func(parser, args, extras))
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/readfish/entry_points/targets.py", line 486, in run
    position = get_device(args.device, host=args.host, port=args.port)
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/readfish/_utils.py", line 446, in get_device
    manager = Manager(host=host, port=port)
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/minknow_api/manager.py", line 353, in __init__
    version_info = self.rpc.get_version_info()
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/minknow_api/manager_service.py", line 434, in get_version_info
    return run_with_retry(self._stub.get_version_info,
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/minknow_api/manager_service.py", line 97, in run_with_retry
    result = MessageWrapper(method(message, timeout=timeout), unwraps=unwraps)
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/grpc/_channel.py", line 1160, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/username/anaconda3/envs/readfish/lib/python3.10/site-packages/grpc/_channel.py", line 1003, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10000: Failed to connect to remote host: Connection refused"
    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-02-05T10:50:06.403476-05:00", grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10000: Failed to connect to remote host: Connection refused"}"

If I don't include --port 10000 I get the same original error as in this issue. I did check the guppy permissions as in this issue, and it seems to be working fine. Any suggestions for how to fix this would be much appreciated! Thank you!

Adoni5 commented 9 months ago

Ho @mccarthyma17 - Thanks for giving Icarust a go!

We made some pretty big updates before Christmas. I've got a couple of questions before I have a crack at diagnosing the issue.

  1. How are you running Icarust? Docker or Natively?
  2. Which versions of ReadFish and Icarust are you running?

Cheers, Rory

mccarthyma17 commented 9 months ago

Hi Rory,

I'm running Icarust natively. The ReadFish version is 2023.1.1 and Icarust is 0.6.0. Thanks!

Adoni5 commented 9 months ago

Okay - that all sounds good.

Could you let me know the contents of the your config.ini file? And just to double check - did you set the MINKNOW_TRUSTED_CA environment variable, as detailed here?

The fact you've gotten it working this far is good. I think you are very close!

As an aside, if you don't include --port 10000 readfish will query for a connection on the default port (9502), which is the port MinKNOW listens on. Obviously, there is no device called Bantersaurus connected to MinKNOW! (assuming that the issue you are referring to is specifically this one https://github.com/LooseLab/Icarust/issues/1#issuecomment-1588691665)

mccarthyma17 commented 9 months ago

Sure! My config.ini file looks like this:

[TLS]
cert-dir = ./static/tls_certs
[PORTS]
manager = 9502
position = 10000

[SEQUENCER]
channels = 3000

Yep, I set the environment variable!

Adoni5 commented 9 months ago

Ahh I see the problem (maybe) - the manager port is the port which readfish connects to originally - it then queries for the port for the position, which is the same setup as in MinKNOW.

If you set

[PORTS]
manager=10000
position=10001

and run the same readfish command, what happens?

mccarthyma17 commented 9 months ago

I got farther by changing that, but it still throws an error:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:5555: FD shutdown"
    debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:5555: FD shutdown", grpc_status:14, created_time:"2024-02-06T13:50:34.87512-05:00"}"
Adoni5 commented 9 months ago

Not seen this one before! This is potentially an issue between the version of GRPC that Icarust is using and the version that Readfish is using. https://github.com/grpc/grpc/issues/22088

The version of tonic - the crate that provides the GRPC implementation for rust, is pinned at 0.8.2. This works on my machine with the latest python grpcio, but for some reason it's not working for you.

Can you tell me:

  1. What you get when running pip list in your readfish environment?
  2. Which ports you have set for manager/position now?
  3. Which OS you are running on?
  4. How you installed Icarust? I assume you used cargo run -r?
mccarthyma17 commented 9 months ago

Sure!

  1. Here's the ReadFish environment

    Package                Version
    ---------------------- ------------
    about-time             4.2.1
    alive-progress         3.1.5
    attrs                  23.2.0
    cattrs                 23.2.3
    certifi                2024.2.2
    click                  8.1.7
    exceptiongroup         1.2.0
    grapheme               0.6.0
    grpcio                 1.60.1
    h5py                   3.10.0
    iniconfig              2.0.0
    mappy                  2.26
    mappy-rs               0.0.7
    minknow-api            5.8.1
    more-itertools         10.2.0
    numpy                  1.26.3
    ont-pyguppy-client-lib 7.2.13
    packaging              23.2
    pip                    23.3.2
    pluggy                 1.3.0
    protobuf               4.25.2
    pyRFC3339              1.1
    pytest                 7.4.4
    pytz                   2023.3.post1
    readfish               2023.1.1
    readfish_summarise     0.2.5
    rtoml                  0.10.0
    setuptools             69.0.3
    tomli                  2.0.1
    typing_extensions      4.9.0
    wheel                  0.42.0
  2. I've set port 10000 for manager and 10001 for position now.

  3. I'm running them on Apple silicon (M2 chip).

  4. Yep, I used cargo run --release -- --help. Seems to run just fine on its own!

Adoni5 commented 9 months ago

Could you try dropping the python version of GRPC to 1.59.2?

pip install grpcio==1.59.2

Alternatively, I've pushed a branch, feature/fd-shutdown which has updated the versions of rust GRPC. IF you could check that version out and try that?

If that doesn't work, I'm starting to suspect there may be some reason that readfish can't connect over the localhost network that I'm not thinking of, but I'm not too familiar with Macs!

Just to super double check, the readfish --port argument is matched up with the config.ini manger port? Maybe you could try a different port combination, perhaps something is using port 10000?

I'm going to ask someone with an M2 Mac to help me debug this!

mccarthyma17 commented 9 months ago

No problem, thanks so much for all your help!

I got the same error message with the downgraded GRPC and the fd-shutdown branches. The port argument matches the config manager port - I did try using a different one, but same problem. Changing the GRPC version messed up some Conda dependencies so I decided to try wiping and reinstalling both programs. Possibly some progress with the fresh installation, but yet another error message:

2024-02-07 14:24:35,313 readfish /Users/username/anaconda3/envs/readfish/bin/readfish targets --toml their_test.toml --device Bantersaurus --experiment-name test_newinstall --port 11000 --log-level info
2024-02-07 14:24:35,313 readfish command='targets'
2024-02-07 14:24:35,313 readfish debug_log=True
2024-02-07 14:24:35,313 readfish device='Bantersaurus'
2024-02-07 14:24:35,313 readfish dry_run=False
2024-02-07 14:24:35,313 readfish experiment_name='test_newinstall'
2024-02-07 14:24:35,313 readfish host='127.0.0.1'
2024-02-07 14:24:35,313 readfish log_file=None
2024-02-07 14:24:35,313 readfish log_format='%(asctime)s %(name)s %(message)s'
2024-02-07 14:24:35,313 readfish log_level='info'
2024-02-07 14:24:35,313 readfish max_unblock_read_length_seconds=5
2024-02-07 14:24:35,313 readfish padding=0
2024-02-07 14:24:35,313 readfish port='11000'
2024-02-07 14:24:35,313 readfish throttle=0.4
2024-02-07 14:24:35,313 readfish toml='their_test.toml'
2024-02-07 14:24:35,313 readfish unblock_duration=0.1
2024-02-07 14:24:35,313 readfish wait_for_ready=60
E0207 14:24:35.404048000 8217189888 ssl_transport_security.cc:1432]    Handshake failed with fatal error SSL_ERROR_SSL: error:0A000086:SSL routines::certificate verify failed.
WARNING:minknow_api:Failed to connect to minknow instance (retry 1/5): failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10001: Failed to connect to remote host: Connection refused
WARNING:minknow_api:Failed to connect to minknow instance (retry 2/5): failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10001: Failed to connect to remote host: Connection refused
WARNING:minknow_api:Failed to connect to minknow instance (retry 3/5): failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10001: Failed to connect to remote host: Connection refused
WARNING:minknow_api:Failed to connect to minknow instance (retry 4/5): failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10001: Failed to connect to remote host: Connection refused
WARNING:minknow_api:Failed to connect to minknow instance (retry 5/5): failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:10001: Failed to connect to remote host: Connection refused
ERROR:root:Failed to connect to read until at 127.0.0.1: 10001

I'll wait to hear if you have any ideas! In the meantime, I'll try getting it up and running on a different OS. Thanks again!

Adoni5 commented 9 months ago

Should be a breeze on Ubuntu/Debian based!

Handshake failed with fatal error SSL_ERROR_SSL: error:0A000086:SSL routines::certificate verify failed.

This error suggests that the certificate env variable is not set. I'm going to expand the running with readfish section in the Readme - @mattloose managed to get it working in the end last night, but he said the Documentation could have been a bit more expansive!

Could you try running this snippet in your readfish python enironment from the README against Icarust?

from minknow_api.manager import Manager
import os         
os.environ["MINKNOW_TRUSTED_CA"] = "/Path/to/Icarust/static/tls_certs/ca.crt"                  
from minknow_api.manager import Manager                                                      
m = Manager( port=11000)                                                                      
pos = next(m.flow_cell_positions())
print(pos)                                                          
con = pos.connect()                                                                          
con.instance.get_version_info()    
mccarthyma17 commented 9 months ago

When I ran that yesterday I got an error, but today it's working and I can start ReadFish. Thank you!

The adaptive sampling starts without any issue, but I quickly run into this error with Icarust and both programs hang. Any suggestions?

ReadFish:

2024-02-08 09:40:00,290 readfish.targets Starting main loop
2024-02-08 09:40:00,298 readfish.targets Using the mappy_rs plugin. Using reference: banth_ref.mmi.

Region Single Bacteria Anthrax has targets on 1 contig, with 1 found in the provided reference.
This region has 2 total targets (+ve and -ve strands), covering approximately 100.00% of the genome.

2024-02-08 09:40:00,300 readfish.targets readfish started in PHASE_SEQUENCING. Fully sequencing first read from each channel.

Icarust:

[2024-02-08T14:40:00Z INFO icarust::impl_services::log: 18 ] "'readfish targets' connected to this device."
[2024-02-08T14:40:00Z INFO icarust::impl_services::data: 646 ] Received stream setup, setting up.
[2024-02-08T14:40:00Z INFO icarust::impl_services::data: 629 ] Unblocked: 0, Stop receiving: 0, Total unblocks 0, total sr 0
[2024-02-08T14:40:00Z INFO icarust::impl_services::log: 18 ] "Utilising the Guppy base-caller plugin:\n\t- address: ipc:///Users/username/Desktop/ont-dorado-server/bin/5555\n\t- config: dna_r10.4.1_e8.2_400bps_hac\n\t- priority: read_priority.high_priority\n\t- client_name: Readfish_connection"
[2024-02-08T14:40:00Z INFO icarust::impl_services::log: 18 ] "Using the mappy_rs plugin. Using reference: banth_ref.mmi.\n\nRegion Single Bacteria Anthrax has targets on 1 contig, with 1 found in the provided reference.\nThis region has 2 total targets (+ve and -ve strands), covering approximately 100.00\\% of the genome.\n"
[2024-02-08T14:40:00Z INFO icarust::impl_services::data: 1542 ] New reads: 0, Occupied: 2688, Empty pores: 3, Dead pores: 309, Sequenced reads: 37, Awaiting: 3
[2024-02-08T14:40:00Z INFO icarust::impl_services::data: 629 ] Unblocked: 0, Stop receiving: 0, Total unblocks 0, total sr 0
[2024-02-08T14:40:00Z INFO icarust::impl_services::data: 629 ] Unblocked: 0, Stop receiving: 0, Total unblocks 0, total sr 0
[2024-02-08T14:40:01Z INFO icarust::impl_services::data: 1542 ] New reads: 3, Occupied: 2592, Empty pores: 99, Dead pores: 309, Sequenced reads: 859, Awaiting: 95
[2024-02-08T14:40:02Z INFO icarust::impl_services::data: 1542 ] New reads: 4, Occupied: 2596, Empty pores: 95, Dead pores: 309, Sequenced reads: 1812, Awaiting: 91
[2024-02-08T14:40:03Z INFO icarust::impl_services::data: 1542 ] New reads: 3, Occupied: 2606, Empty pores: 85, Dead pores: 309, Sequenced reads: 2598, Awaiting: 82
[2024-02-08T14:40:04Z INFO icarust::impl_services::data: 1542 ] New reads: 1, Occupied: 2619, Empty pores: 72, Dead pores: 309, Sequenced reads: 3429, Awaiting: 71
[2024-02-08T14:40:05Z INFO icarust::impl_services::data: 475 ] Writing out file to /tmp/test_2_bacteria/test/20240208_1439_XIII_FAQ1234_ee29f3c5e/fast5_pass//FAQ1234_pass_ee29f3_0.fast5
thread '<unnamed>' panicked at src/impl_services/data.rs:543:34:
called `Result::unwrap()` on an `Err` value: Filter not available: User(32020, [0, 2, 1, 1])
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[2024-02-08T14:40:05Z INFO icarust::impl_services::data: 1542 ] New reads: 3, Occupied: 2619, Empty pores: 72, Dead pores: 309, Sequenced reads: 4275, Awaiting: 69
thread '<unnamed>' panicked at src/impl_services/data.rs:1488:66:
called `Result::unwrap()` on an `Err` value: SendError { .. }
thread 'tokio-runtime-worker' panicked at src/impl_services/data.rs:1594:22:
called `Result::unwrap()` on an `Err` value: PoisonError { .. }
Adoni5 commented 9 months ago

This is a simple one, that I have seen before! When running Icarust, it expects to find Nanopores VBZ_plugin in the same directory as the one that you executed the command in. I never found a smarter way of doing it!

And I quote this commit message - https://github.com/LooseLab/Icarust/commit/d87e76ec09cdf33ecbabf9598644e5547f558c58.

Specifically this is saying that it can't find the HDF5 filter plugin when it's trying to write the first FAST5 file.

thread '<unnamed>' panicked at src/impl_services/data.rs:543:34:
called `Result::unwrap()` on an `Err` value: Filter not available: User(32020, [0, 2, 1, 1])

It's a bit of a pain but can you run Icarust in the directory of the source code?

This is also definitely being added to the Docs!

mccarthyma17 commented 9 months ago

That worked! Thank you so much for all your help!

Adoni5 commented 9 months ago

Brilliant, thanks for your patience! I'm going to close this issue for now, even if I never truly understood the FD shutdown error 😆

If you have any more issues or questions, please open another one!