LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
169 stars 33 forks source link

Readfish with Guppy 6 #170

Closed hasindu2008 closed 1 year ago

hasindu2008 commented 2 years ago

Is it possible to get readfish working on Guppy 6?

mattloose commented 2 years ago

Hi Hasindu,

It will be....

We're testing with Guppy 6 and will release a branch with compatibility soon. There are a couple of changes between 5 and 6 which need to be handled well.

Will post here when it's available.

alexomics commented 2 years ago

Guppy v5.1.0 and v6.0.0 introduce the use of IPC sockets on linux and OSX see: https://community.nanoporetech.com/posts/guppy-v6-0-0-release

This requires a change in configuration in the TOML files, where `host' is given this must point to the guppy IPC address e.g.:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy"
port = 5555

This is a temporary fix and I intend to make a single compatible interface when I have time

hasindu2008 commented 2 years ago

This is great. Will give this a try soon. Will this also work with r10 flowcell?

hasindu2008 commented 2 years ago

Hi I attempted to get this version working, however, seems like readfish get stuck after the initial log messages. Let me tell the steps I followed, in case you can spot something that I did wrong.

python3 -m venv readfish
. readfish/bin/activate
pip3 install --upgrade pip
pip install git+https://github.com/nanoporetech/read_until_api@v3.0.0
pip install git+https://github.com/LooseLab/readfish@guppy_6
pip install ont_pyguppy_client_lib==6.0.1

Then run guppy on a different terminal:

sudo /data1/software/ont-guppy-6.0.1/bin/guppy_basecall_server --log_path guppy.log --config dna_r9.4.1_450bps_fast.cfg --num_callers 1 --cpu_threads_per_caller 14 --port /tmp/.guppy/5557 --ipc_threads 3 -x cuda:all

My TOML file looks like:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy"
port = 5557

[conditions]
reference = "/data1/readfish/hg38noAlt.idx"

[conditions.0]
name = "ReadFish_v6_gene_targets.collapsed.hg38"
control = false
min_chunks = 0
max_chunks = 16
targets = "ReadFish_v6_gene_targets.collapsed.hg38.txt"
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

Then readfish:


readfish) minknow@mini-fridge:/data1/readfish$ readfish targets --device MN19348 --experiment-name "ReadFish_v6_gene_targets" --toml ReadFish_v6_gene_targets_guppy6.collapsed.hg38.toml
2022-02-07 14:13:41,430 ru.ru_gen /home/hasindu/readfish/bin/readfish targets --device MN19348 --experiment-name ReadFish_v6_gene_targets --toml ReadFish_v6_gene_targets_guppy6.collapsed.hg38.toml
2022-02-07 14:13:41,430 ru.ru_gen batch_size=512
2022-02-07 14:13:41,430 ru.ru_gen cache_size=512
2022-02-07 14:13:41,430 ru.ru_gen channels=[1, 512]
2022-02-07 14:13:41,430 ru.ru_gen chunk_log=None
2022-02-07 14:13:41,430 ru.ru_gen command=targets
2022-02-07 14:13:41,430 ru.ru_gen device=MN19348
2022-02-07 14:13:41,430 ru.ru_gen dry_run=False
2022-02-07 14:13:41,430 ru.ru_gen experiment_name=ReadFish_v6_gene_targets
2022-02-07 14:13:41,430 ru.ru_gen func=<function run at 0x7fddf77ee0d0>
2022-02-07 14:13:41,430 ru.ru_gen host=127.0.0.1
2022-02-07 14:13:41,430 ru.ru_gen log_file=None
2022-02-07 14:13:41,430 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2022-02-07 14:13:41,430 ru.ru_gen log_level=info
2022-02-07 14:13:41,430 ru.ru_gen paf_log=None
2022-02-07 14:13:41,430 ru.ru_gen port=9501
2022-02-07 14:13:41,430 ru.ru_gen run_time=172800
2022-02-07 14:13:41,430 ru.ru_gen throttle=0.4
2022-02-07 14:13:41,430 ru.ru_gen toml=ReadFish_v6_gene_targets_guppy6.collapsed.hg38.toml
2022-02-07 14:13:41,430 ru.ru_gen unblock_duration=0.1
2022-02-07 14:13:41,430 ru.ru_gen workers=1
2022-02-07 14:13:41,439 ru.ru_gen Initialising minimap2 mapper
2022-02-07 14:13:46,450 ru.ru_gen Mapper initialised
2022-02-07 14:13:46,457 ru.ru_gen This experiment has 1 region on the flowcell
2022-02-07 14:13:46,457 ru.ru_gen Using reference: /data1/readfish/hg38noAlt.idx

2022-02-07 14:13:51,425 ru.ru_gen Region 'ReadFish_v6_gene_targets.collapsed.hg38' (control=False) has 23 contigs of which 23 are in the reference. There are 458 targets (including +/- strand) representing 1.63% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.

Nothing is printed afterwards - seems to get stuck or something.

mattloose commented 2 years ago

Can you confirm which version of Minknow you are running?

thanks

hasindu2008 commented 2 years ago

MinKNOW 21.11.8 Core version 4.5.4

hasindu2008 commented 2 years ago

By the way, the old read fish setup for Guppy 4 on the same machine works (on this same MinKNOW version) when I run with Guppy 4. It is this new readfish version with Guppy 6 that gives the problem.

alexomics commented 2 years ago

Couple of questions:

  1. Do you get a channels.toml file written in the MinKNOW run data directory for this experiment?
  2. Can you check your guppy server log for lines containing client connection request? E.g:
    2022-02-08 08:53:54.734530 [guppy/info] client connection request. ["dna_r9.4.1_450bps_fast:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0"]
    2022-02-08 08:53:54.734660 [guppy/info] New client connected Client 1 anonymous_client_1 id: b5b211db-55b8-4d83-a1fd-fb140794f448 (connection string = 'dna_r9.4.1_450bps_fast:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0').
  3. Did you get a message in the MinKNOW interface one of: "This is a live run. Unblocks will occur." or "This is a test run. No unblocks will occur."

Trying to narrow down where this is getting stuck.

hasindu2008 commented 2 years ago
  1. Yes channels.toml is written

    -rw-rw-r-- 1 minknow minknow 2.7K Feb  8 23:06 channels.toml
    drwxrwxr-x 2 minknow minknow 4.0K Feb  8 23:01 fast5
    drwxrwxr-x 2 minknow minknow 4.0K Feb  8 23:03 other_reports
    -rw-rw-r-- 1 minknow minknow    0 Feb  8 23:06 unblocked_read_ids.txt
  2. Guppy does not print anything about a client connection

    
    sudo /data1/software/ont-guppy-6.0.1/bin/guppy_basecall_server --log_path guppy.log --config dna_r9.4.1_450bps_fast.cfg --num_callers 1 --cpu_threads_per_caller 14 --port /tmp/.guppy/5557 --ipc_threads 3 
    ONT Guppy basecall server software version 6.0.1+652ffd1, client-server API version 10.0.0
    log path:            guppy.log
    chunk size:          2000
    chunks per runner:   160
    max queued reads:    2000
    num basecallers:     1
    num socket threads:  3
    max returned events: 50000
    cpu mode:             ON
    threads per caller:  14

Config loaded: config file: /data1/software/ont-guppy-6.0.1/data/dna_r9.4.1_450bps_fast.cfg model file: /data1/software/ont-guppy-6.0.1/data/template_r9.4.1_450bps_fast.jsn model version id 2021-05-17_dna_r9.4.1_minion_96_29d8704b adapter scaler model file: None Starting server on port: ipc:///tmp/.guppy/5557



3. Yes. That message is present.
![image](https://user-images.githubusercontent.com/12987163/152984495-e5cf40b9-7992-4244-8097-f131bae37c4a.png)
alexomics commented 2 years ago

Guppy won't log the attempted connection to the terminal, instead it should be in the log file that you specified guppy.log.

hasindu2008 commented 2 years ago

Unfortunately, that log file has the same information:

2022-02-08 22:59:47.961195 [guppy/message] ONT Guppy basecall server software version 6.0.1+652ffd1, client-server API version 10.0.0
log path:            guppy.log
chunk size:          2000
chunks per runner:   160
max queued reads:    2000
num basecallers:     1
num socket threads:  3
max returned events: 50000
cpu mode:             ON
threads per caller:  14
2022-02-08 22:59:47.961303 [guppy/info] crashpad_handler not supported on this platform.
2022-02-08 22:59:47.961538 [guppy/info] Listening on port ipc:///tmp/.guppy/5557.
2022-02-08 22:59:47.962024 [guppy/info] lamp_arrangements arrangement folder not found: /data1/software/ont-guppy-6.0.1/data/read_splitting/lamp_arrangements
2022-02-08 22:59:48.025350 [guppy/message] 
Config loaded:
config file:               /data1/software/ont-guppy-6.0.1/data/dna_r9.4.1_450bps_fast.cfg
model file:                /data1/software/ont-guppy-6.0.1/data/template_r9.4.1_450bps_fast.jsn
model version id           2021-05-17_dna_r9.4.1_minion_96_29d8704b
adapter scaler model file: None
2022-02-08 22:59:48.025891 [guppy/message] Starting server on port: ipc:///tmp/.guppy/5557
alexomics commented 2 years ago

With the readfish python environment activated, could you try running:

python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; \
           c = PGC("ipc:///tmp/.guppy/5557", "dna_r9.4.1_450bps_fast.cfg"); \
           c.connect(); print(c)'

I'm not sure that you are able to connect to the basecall server

hasindu2008 commented 2 years ago

It also seem to get stuck

(readfish) minknow@mini-fridge:/data1/readfish$ python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; c = PGC("ipc:///tmp/.guppy/5557", "dna_r9.4.1_450bps_fast.cfg"); c.connect(); print(c)'

Nothing turned up in the log.

alexomics commented 2 years ago

Definitely the guppy connection then. Could you try with the guppy server not started as root?

hasindu2008 commented 2 years ago

Sorry for mistakenly closing and then reopening the issue.

Yeh, initially I wanted to run Guppy as a normal user which I always used to do. But this new IPC thing in Guppy 6 will not let me do so.

/data1/software/ont-guppy-6.0.1/bin/guppy_basecall_server --log_path guppy.log --config dna_r9.4.1_450bps_fast.cfg --num_callers 1 --cpu_threads_per_caller 14 --port /tmp/.guppy/5557 --ipc_threads 3 
ONT Guppy basecall server software version 6.0.1+652ffd1, client-server API version 10.0.0
log path:            guppy.log
chunk size:          2000
chunks per runner:   160
max queued reads:    2000
num basecallers:     1
num socket threads:  3
max returned events: 50000
cpu mode:             ON
threads per caller:  14
[guppy/error] run_server: Permission denied. Error initialising basecall server using port: ipc:///tmp/.guppy/5557. Aborting.
The basecall server has shut down successfully.
alexomics commented 2 years ago

No worries!

That's annoying, can you initialise the IPC socket in the current directory? Something like:

/data1/software/ont-guppy-6.0.1/bin/guppy_basecall_server --log_path guppy.log --config dna_r9.4.1_450bps_fast.cfg --num_callers 1 --cpu_threads_per_caller 14 --port 5557 --ipc_threads 3

(I've changed the port param)

The connection address might be a little bit ugly e.g ipc:///data/software/guppy_bins/guppy_6.0.1/ont-guppy

hasindu2008 commented 2 years ago

ahhh right It is this annoying minknow user created by MinKNOW that takes ownership of all minknow related directories, USB devices and even the .guppy in /tmp/. I tried running Guppy under the minknow user and then ran readfish also under that minknow user and now it seems to be running!!!!

2022-02-08 23:57:31,570 ru.ru_gen Region 'ReadFish_v6_gene_targets.collapsed.hg38' (control=False) has 23 contigs of which 23 are in the reference. There are 458 targets (including +/- strand) representing 1.63% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
2022-02-08 23:57:46,881 ru.ru_gen 96R/14.89373s
2022-02-08 23:57:58,833 ru.ru_gen 217R/11.95193s
2022-02-08 23:58:13,798 ru.ru_gen 292R/14.96391s
2022-02-08 23:58:28,934 ru.ru_gen 194R/15.13501s
2022-02-08 23:58:45,689 ru.ru_gen 276R/16.75389s
2022-02-08 23:59:02,820 ru.ru_gen 225R/17.13113s
2022-02-08 23:59:21,810 ru.ru_gen 256R/18.98901s
2022-02-08 23:59:42,313 ru.ru_gen 300R/20.50213s
2022-02-09 00:00:01,703 ru.ru_gen 243R/19.38894s
2022-02-09 00:00:20,884 ru.ru_gen 279R/19.18075s
2022-02-09 00:00:41,094 ru.ru_gen 240R/20.20871s
2022-02-09 00:01:02,438 ru.ru_gen 316R/21.34409s
....
alexomics commented 2 years ago

Fantastic! I'll add this to the guppy 6 notes I'm slowly writing 😅

hasindu2008 commented 2 years ago

Thanks, for helping to solve this issue. I will try to see if I can workaround to get rid of this annoying minknow user.

StevenVerbruggen commented 2 years ago

Hi all, I seem to have a comparable issue, but with Guppy 5.1.3. We have a GridION at our lab and we had Readfish working smoothly. Fantastic software! We could achieve very cool stuff with it earlier.

Recently, we updated our Linux to Ubuntu 20 and with it, also the GridION software package. We have following tool versions now at hand:

minknow 21.11.7
core-gridion 4.5.4
guppy-gridion 5.1.13

I installed Readfish using conda as in issue #124, but I updated ont-pyguppy-client-lib==4.2.3 to ont-pyguppy-client-lib==5.1.13 in the yml file. Conda list:

# packages in environment at /data/conda/anaconda3/envs/readfish:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
attrs                     21.4.0                   pypi_0    pypi
beautifulsoup4            4.10.0                   pypi_0    pypi
biopython                 1.76                     pypi_0    pypi
ca-certificates           2021.10.8            ha878542_0    conda-forge
certifi                   2021.10.8                pypi_0    pypi
charset-normalizer        2.0.12                   pypi_0    pypi
google                    3.0.0                    pypi_0    pypi
grpcio                    1.44.0                   pypi_0    pypi
idna                      3.3                      pypi_0    pypi
importlib-metadata        4.11.1                   pypi_0    pypi
importlib-resources       5.4.0                    pypi_0    pypi
jsonschema                4.4.0                    pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_12    conda-forge
libgomp                   11.2.0              h1d223b6_12    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_12    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
mappy                     2.24                     pypi_0    pypi
minknow-api               4.2.4                    pypi_0    pypi
ncurses                   6.3                  h9c3ff4c_0    conda-forge
numpy                     1.17.4                   pypi_0    pypi
ont-pyguppy-client-lib    5.1.13                   pypi_0    pypi
openssl                   3.0.0                h7f98852_2    conda-forge
packaging                 21.3                     pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
pip                       22.0.3             pyhd8ed1ab_0    conda-forge
protobuf                  3.19.4                   pypi_0    pypi
pyparsing                 3.0.7                    pypi_0    pypi
pyrsistent                0.18.1                   pypi_0    pypi
python                    3.7.12          hf930737_100_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.7                     2_cp37m    conda-forge
pytz                      2021.3                   pypi_0    pypi
read-until                3.0.0                    pypi_0    pypi
readfish                  0.0.8a2                  pypi_0    pypi
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.27.1                   pypi_0    pypi
setuptools                60.9.3           py37h89c1867_0    conda-forge
six                       1.16.0                   pypi_0    pypi
soupsieve                 2.3.1                    pypi_0    pypi
sqlite                    3.37.0               h9cd32fc_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2                   pypi_0    pypi
typing-extensions         4.1.1                    pypi_0    pypi
urllib3                   1.26.8                   pypi_0    pypi
watchdog                  2.1.6                    pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zipp                      3.7.0                    pypi_0    pypi
zlib                      1.2.11            h36c2ea0_1013    conda-forge

Then I tried everything with the select_chr21_22 simulation as in the tutorial. The unblock-all program works smoothly with unblocking responses of around 0.01. Problems started to arise when perfroming the targets program:

readfish targets --device X1 --experiment-name "RU Test basecall and map" --toml /data/software/human_chr_selection.toml --log-file /data/steven/readfish_config_tests/ru_test.log

The toml file looks like this:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy"
port = 5555

[conditions]
reference = "/data/genomes/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/Index/genome_readfish.mmi"

[conditions.0]
name = "select_chr_21_22"
control = false
min_chunks = 0
max_chunks = 8
targets = ["chr21", "chr22"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

So, as you can see, I updated the host to cope with the guppy IPC updates as stated higher in this issue. The log output I receive from Readfish is:

2022-02-24 15:05:24,370 ru.ru_gen /data/conda/anaconda3/envs/readfish/bin/readfish targets --device X1 --experiment-name RU Test basecall and map --toml human_chr_selection.toml --log-file ru_test.log --log-level debug
2022-02-24 15:05:24,371 ru.ru_gen batch_size=512
2022-02-24 15:05:24,371 ru.ru_gen cache_size=512
2022-02-24 15:05:24,371 ru.ru_gen channels=[1, 512]
2022-02-24 15:05:24,371 ru.ru_gen chunk_log=None
2022-02-24 15:05:24,371 ru.ru_gen command=targets
2022-02-24 15:05:24,371 ru.ru_gen device=X1
2022-02-24 15:05:24,371 ru.ru_gen dry_run=False
2022-02-24 15:05:24,371 ru.ru_gen experiment_name=RU Test basecall and map
2022-02-24 15:05:24,371 ru.ru_gen func=<function run at 0x7f9015c71d40>
2022-02-24 15:05:24,371 ru.ru_gen host=127.0.0.1
2022-02-24 15:05:24,371 ru.ru_gen log_file=ru_test.log
2022-02-24 15:05:24,371 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2022-02-24 15:05:24,371 ru.ru_gen log_level=debug
2022-02-24 15:05:24,371 ru.ru_gen paf_log=None
2022-02-24 15:05:24,371 ru.ru_gen port=9501
2022-02-24 15:05:24,371 ru.ru_gen run_time=172800
2022-02-24 15:05:24,371 ru.ru_gen throttle=0.4
2022-02-24 15:05:24,371 ru.ru_gen toml=human_chr_selection.toml
2022-02-24 15:05:24,371 ru.ru_gen unblock_duration=0.1
2022-02-24 15:05:24,371 ru.ru_gen workers=1
2022-02-24 15:05:24,375 ru.ru_gen Initialising minimap2 mapper
2022-02-24 15:05:30,525 ru.ru_gen Mapper initialised
2022-02-24 15:05:30,533 ru.ru_gen This experiment has 1 region on the flowcell
2022-02-24 15:05:30,533 ru.ru_gen Using reference: /data/genomes/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/Index/genome_readfish.mmi
2022-02-24 15:05:39,642 ru.ru_gen Region 'select_chr_21_22' (control=False) has 2 contigs of which 2 are in the reference. There are 4 targets (including +/- strand) representing 3.16% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.

No ru_gen unblocking times are afterwards reported. No unblocking is seen in the live read length distributions. MinKNOW does report 'Readfish is controlling sequencing on this device. Use at your own risk.' and other comparable system messages. A channels.toml is present in the results directory. After a while, I get following messages though from the readfish log output:

[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG

I use the default guppy basecall server which is running on /tmp/.guppy/5555. Based on the logs in /var/log/guppy/, it seems that minKNOW can connect to this, but Readfish can't.

I tried running a separate basecall server on e.g. port 5557 and then connect Readfish through the toml to this port, but also no sign of connection there. Running this command:

python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; c = PGC("ipc:///tmp/.guppy/5555", "dna_r9.4.1_450bps_fast.cfg"); c.connect(); print(c)'

did give me a connection though.

I also checked in htop which users are controlling what, but everything seems to be executed by the grid user: KingFisher/MinKNOW, guppy_basecall_server, minKNOW/control_server, Readfish

Any tips on how to proceed with fixing this bug? Any checks I can do? I was in doubt whether to update to another Guppy version but in general I expect less problems when I keep everything as it is in the updated gridion software release.

Best, Steven

alexomics commented 2 years ago

@StevenVerbruggen couple of quick questions. Though, this does look similar.

If you run:

guppy_basecall_server --version

What version do you see?

I think that we are running very similar setup (though on 16.04 still) with:

So downgrading ont-pyguppy-client-lib to 5.1.12 may fix this.

mahreenkn commented 2 years ago

Hi all,

I'm currently having the same issue as @hasindu2008, although I'm running MinKnow v.21.11.19 with guppy version 5.1.15.

The unblock_all command works beautifully and I can monitor this via the MinKnow GUI, but the readfish targets command (run as per the tutorial in the dev branch) seems to get stuck, and then times out with the same guppy error:

[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG

Running this command doesn't give me any output:

python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; c = PGC("ipc:///tmp/.guppy/5555", "dna_r9.4.1_450bps_fast.cfg"); c.connect(); print(c)'

And just checking in using htop, it seems minknow is controlling the basecalling process. Is this the reason the Readfish command isn't working? Is there a workaround to this?

Thank you for your help!

Mahreen

mattloose commented 2 years ago

Can I check which os you are on please?

StevenVerbruggen commented 2 years ago

Hi Alex, Thanks for your suggestions. Find below the output of the guppy basecall server version:

guppy_basecall_server --version
: Guppy Basecall Service Software, (C) Oxford Nanopore Technologies, Limited. Version 5.1.13+b292f4d13, client-server API version 10.0.0

I downgraded ont-pyguppy-client-lib to 5.1.12 with pip and restarted a readfish run. No change in the outcome though.

mahreenkn commented 2 years ago

Can I check which os you are on please?

Hi @mattloose - wasn't sure if you wanted this information from me or Steven (sorry for jumping into the replies - I can create a separate issue if it would be easier to keep track that way?) but I'm running Ubuntu 20.04.2 LTS.

Thanks!

StevenVerbruggen commented 2 years ago

Hi @alexomics and @mattloose, Any updates on this issue for the GridION? I am wondering whether it would help to downgrade gridion-core to release 21.10.8 with guppy 5.0.17 included (I suppose this is the last version without the IPC sockets) Thanks for your suggestions!

mattloose commented 2 years ago

@mahreenkn my question was to you - there is an issue on MinKNOW on ubuntu at present with respect to permissions. This might be what is causing your problem.

@StevenVerbruggen we're not sure why your version isn't working at present - @alexomics is looking at it.

alexomics commented 2 years ago

@StevenVerbruggen, @mahreenkn it looks like you both are using Ubuntu 20.04. We have yet to upgrade from 16.04. Some other questions, are you using the default gridion user (grid)?

We're struggling to replicate this issue 😅

StevenVerbruggen commented 2 years ago

Hi @alexomics, Thanks for looking into this. I have indeed recently upgraded to Ubuntu 20.04. In htop it looks like the grid user is owner of the MinKNOW and guppy basecall server processes (MinKNOW is started from the graphical interface and the default guppy basecall server on port 5555 is started together with it I suppose). Readfish is run from command line under the grid user (within a separate conda env). Haha, funny that recreating a bug could also be a hell of job 😉

StevenVerbruggen commented 2 years ago

Hi @alexomics, Any luck in replicating the issue yet? 😅

mattloose commented 2 years ago

At the moment no - we can't see this issue on any of our platforms.

We have just run some testing on Ubuntu 20.04 and everything seemed to work. However, this wasn't a direct install on linux. We'll keep you posted - we hope for a new minknow update from ONT shortly which might address this.

StevenVerbruggen commented 2 years ago

Hi @mattloose, Thanks for your update. We will check internally and maybe also with ONT what we can do in the meantime to do adaptive sampling in any way. If there would be any news or findings popping up at your side, I would be glad to hear them. In any case, thank you very much for the suggestions. Best, Steven

starnoux commented 2 years ago

Hi @mattloose,

Currently running on ubuntu 20.04.3, I also experience the same issue as both @StevenVerbruggen and @mahreenkn. I run MinKnow 21.11.9, with a gpu Quadro RTX 4000 and the guppy version 5.1.15. I tried downgrading ont-pyguppy-client-lib to 5.1.12 but nothing changed.

If you cannot replicate the issue, it seems that here we can ^^'.

I would be highly interested if someone find a solution and shares it with us. At the moment, I don't see much of a solution from my side. Best, Stéphanie

mattloose commented 2 years ago

OK - this is frustrating.

Could you confirm the group and owner of the following folders on your system:

/var/log/minknow

/var/lib/minknow/data

In addition could you check the owners of the sub folders as well?

Also can you confirm if you did a fresh install of MinKNOW onto this system - i.e it wasn't an update from a previous version?

samhorsfield96 commented 2 years ago

Hi all, I have got readfish working on the simulation example in the README using ubuntu 20.04, MinKNOW=21.11.9, guppy=5.1.15, ont-pyguppy-client-lib=5.1.15, with the readfish guppy_6 branch on a RTX A5000 with MinION (not sure if this would work with GridION).

As guppy and MinKNOW run under the 'minknow' user, I ran the readfish targets command also under this user using:

sudo runuser -l minknow -c '/path/to/readfish targets --device <device name> --experiment-name "Experiment name" --toml /path/to/toml --log-file <log file>'

I was then able to see off-target reads being rejected, and the read length distributions indicated enrichment of the target chromosomes (chr21 and chr22). Hopefully this helps, please let me know if there are any further details you need.

starnoux commented 2 years ago

@mattloose Hi again, so the rights are: regarding the install, it is as fresh as it can be. I had previously issues with GPU and uninstall / Re-install all. Regarding the rights: for /var/log/minknow -rw-rw-r-- 1 minknow minknow 391 mars 22 16:52 basecall_manager_log-0.txt -rw-rw-r-- 1 minknow minknow 625 févr. 22 14:46 basecall_manager_log-10.txt -rw-rw-r-- 1 minknow minknow 625 mars 22 16:51 basecall_manager_log-1.txt -rw-rw-r-- 1 minknow minknow 625 mars 22 15:57 basecall_manager_log-3.txt But for the /var/log/minknow (I had already changed it) drwxrwxr-x 3 minknow minknow 4,0K mars 22 17:18 2022-22-03-Test_TO

But as the basecalling was finishing, it seems the following error happened... Could it be one way to find a solution ? [/opt/ont/minknow/conf/package/sequencing/TO_chr9_selection2.toml]: Cannot be parsed correctly - Expecting to see all of the following sections: ['device', 'script', 'compatibility.minknow_core', 'meta.protocol.experiment_type', 'meta.exp_script_purpose'].

Edit: as I tried several times before to find the proper [issue with the chromosome name], I don't know if it is related or not :/

Thanks for the quick answers by the way. Cheers, S.

starnoux commented 2 years ago

Dear @alexomics, did you by any chance find a solution to that issue ? I could not find a solution even by passing the API to v5.0

alexomics commented 2 years ago

@starnoux, sorry this slipped through. Looking at your last message it seems that you are passing a readfish configuration TOML to MinKNOW. These are separate files. MinKNOW should be using it's own config files with names like sequencing_MIN106_DNA.toml. The readfish TOML file should be passed on the command line to readfish.

Can you share the command that you are running readfish with?

StevenVerbruggen commented 2 years ago

Hi @mattloose, Thanks for your update. We will check internally and maybe also with ONT what we can do in the meantime to do adaptive sampling in any way. If there would be any news or findings popping up at your side, I would be glad to hear them. In any case, thank you very much for the suggestions. Best, Steven

Small update by the way to this issue for GridION (could maybe useful to other GridION users). Finally managed to get things working again. Based on adaptive sampling simulation runs, everything runs smooth again! End March, nanopore released a new GridION software package. In here, following tool versions:

Guppy 6.0.6
MinKNOW core 5.0.0
Grid software 22.03.2

I think, especially the update from Guppy5.1 to Guppy6 was important here. Then, I followed the considerations mentioned in https://github.com/LooseLab/readfish/issues/187#issuecomment-1081854776 to fix installation issues on the guppy6 and issue187 git branches. Readfish issue187 runs smoothly now.

Thanks in any case for all suggestions everyone!

starnoux commented 2 years ago

Hi @alexomics,

I run the readfish through command line: readfish targets --experiment-name "TO22D007_TEST_READFISH" --device MN30687 --toml /opt/ont/minknow/conf/package/sequencing/Tomato_chr9_selection3.toml --log-file Test_TO.log I created the toml file as suggested in your documentations, it looks like advised:

[caller_settings]
config_name = "dna_r9.4.1_450bps_hac"
host = "127.0.0.1"
port = 5555

[conditions]
reference = "/home/Documents/Ref_TO.mmi"

[conditions.0]
name = "select_chr9"
control = false
min_chunks = 0
max_chunks = inf
targets = ["SL4.0ch09"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

I have to precise that I am working on MinIon and not on GridIon. But the GPU works fine and the basecalling is running as fast as the sequencing speed. Do you mean, I should check on the config file ?

12032345 commented 2 years ago

Dear all,

I had a similar problem as @hasindu2008 when trying to run readfish on gridion. I tried updating readfish to guppy6 version and changing the host in the TOML file to the guppy IPC address, but it doesn't seem to work. I get the following error when running readfish:

readfish targets --device X1 --experiment-name "test" --toml balf_0414.toml 
2022-04-19 14:49:49,305 ru.ru_gen /home/grid/software/readfish/bin/readfish targets --device X1 --experiment-name test --toml balf_0414.toml
2022-04-19 14:49:49,306 ru.ru_gen batch_size=512
2022-04-19 14:49:49,306 ru.ru_gen cache_size=512
2022-04-19 14:49:49,306 ru.ru_gen channels=[1, 512]
2022-04-19 14:49:49,306 ru.ru_gen chunk_log=None
2022-04-19 14:49:49,306 ru.ru_gen command=targets
2022-04-19 14:49:49,306 ru.ru_gen device=X1
2022-04-19 14:49:49,306 ru.ru_gen dry_run=False
2022-04-19 14:49:49,306 ru.ru_gen experiment_name=test
2022-04-19 14:49:49,306 ru.ru_gen func=<function run at 0x7f6eee5abd30>
2022-04-19 14:49:49,306 ru.ru_gen host=127.0.0.1
2022-04-19 14:49:49,306 ru.ru_gen log_file=None
2022-04-19 14:49:49,306 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2022-04-19 14:49:49,306 ru.ru_gen log_level=info
2022-04-19 14:49:49,306 ru.ru_gen paf_log=None
2022-04-19 14:49:49,306 ru.ru_gen port=9501
2022-04-19 14:49:49,306 ru.ru_gen run_time=172800
2022-04-19 14:49:49,306 ru.ru_gen throttle=0.4
2022-04-19 14:49:49,306 ru.ru_gen toml=balf_0414.toml
2022-04-19 14:49:49,306 ru.ru_gen unblock_duration=0.1
2022-04-19 14:49:49,306 ru.ru_gen workers=1
2022-04-19 14:49:49,768 ru.ru_gen Initialising minimap2 mapper
2022-04-19 14:49:51,920 ru.ru_gen Mapper initialised
2022-04-19 14:49:51,930 ru.ru_gen This experiment has 1 region on the flowcell
2022-04-19 14:49:51,930 ru.ru_gen Using reference: /data/SYH/balf_0414.mmi
2022-04-19 14:49:53,678 ru.ru_gen Region 'human' (control=False) has 25069 contigs of which 25069 are in the reference. There are 50138 targets (including +/- strand) representing 99.82% of the reference. Reads will be unblocked when classed as single_on or multi_on; sequenced when classed as single_off or multi_off; and polled for more data when classed as no_map or no_seq.
Traceback (most recent call last):
  File "/home/grid/software/readfish/bin/readfish", line 11, in <module>
    load_entry_point('readfish==0.0.9a1', 'console_scripts', 'readfish')()
  File "/home/grid/software/readfish/lib/python3.8/site-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/home/grid/software/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 498, in run
    simple_analysis(
  File "/home/grid/software/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 155, in simple_analysis
    caller = Caller(
  File "/home/grid/software/readfish/lib/python3.8/site-packages/ru/basecall.py", line 41, in __init__
    self.connect()
  File "/home/grid/software/readfish/lib/python3.8/site-packages/pyguppy_client_lib/pyclient.py", line 167, in connect
    raise ConnectionError("Connection attempt timed out: {!r}".format(return_code))
ConnectionError: Connection attempt timed out: result.timed_out

The software version is:

readfish:0.0.9a1
guppy:5.1.13
MinKNOW:21.11.7
MinKNOW core:4.5.4

The TOML file is:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc://5557"
port = 5557

[conditions]
reference = "/data/SYH/balf_0414.mmi"

[conditions.0]
name = "human"
control = false
min_chunks = 0
max_chunks = inf
targets = "/data/SYH/classified_human_id"
single_on = "unblock"
multi_on = "unblock"
single_off = "stop_receiving"
multi_off = "stop_receiving"
no_seq = "proceed"
no_map = "proceed"

My guppy server log:

2022-04-19 13:43:38.757238 [guppy/message] ONT Guppy basecall server software version 5.1.13+b292f4d, client-server API version 10.0.0
log path:            guppy.log
chunk size:          2000
chunks per runner:   160
max queued reads:    2000
num basecallers:     1
num socket threads:  3
max returned events: 50000
cpu mode:             ON
threads per caller:  14

2022-04-19 13:43:38.757394 [guppy/info] crashpad_handler successfully launched.
2022-04-19 13:43:38.757730 [guppy/info] Listening on port ipc://5557.
2022-04-19 13:43:38.782780 [guppy/message] 
Config loaded:
config file:               /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file:                /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
model version id           2021-05-17_dna_r9.4.1_minion_96_29d8704b
adapter scaler model file: None

2022-04-19 13:43:38.783942 [guppy/message] Starting server on port: ipc://5557

And when I try running:

python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; \
           c = PGC("ipc://5557", "dna_r9.4.1_450bps_fast.cfg"); \
           c.connect(); print(c)'

An error like "Connection error: Connection attempt timed out: result.timed_out" occurred.

This also seems to be due to the problem that guppy cannot connect. How to solve this problem on gridion?

Hope to get your advice! Thank you very much!!

best, Sophie

starnoux commented 2 years ago

Hi @alexomics,

I tried again with the indications you gave on other part. So now I ran it as sudo and also I tried to change it for the conda environment. Indeed you are right the issues is the connection. With the proper environment (I needed to install all with sudo) it worked like a charm it seems.

name: readfish
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.7
  - pip
  - pip:
    - git+https://github.com/nanoporetech/read_until_api@v3.0.0
    - ont-pyguppy-client-lib==6.0.6
    - git+https://github.com/LooseLab/readfish@issue187
    - git+https://github.com/nanoporetech/minknow_api@5.0.0#subdirectory=python

I configure my toml as following:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy"
port = 5555

My running command looks like following: sudo runuser -l minknow -c 'readfish targets --experiment-name "Name experiment" --device MN30687 --toml /opt/ont/minknow/conf/package/sequencing/Experiment.toml --log-file Test_million.log'

Just in case it helps someone. Thanks a lot for all the help, Stéph

starnoux commented 2 years ago

Hi @alexomics,

From my previous message, I had been a bit to excited, and after coming back to runs. I figured that actually it works but the connection to the GPU is not quite good. This time I could see that it runs, which is promising, but it is so slow:

Traceback (most recent call last):
  File "/usr/local/bin/readfish", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/usr/local/lib/python3.8/dist-packages/ru/ru_gen.py", line 498, in run
    simple_analysis(
  File "/usr/local/lib/python3.8/dist-packages/ru/ru_gen.py", line 398, in simple_analysis
    decisiontracker.fetch_proportion_accepted(),
  File "/usr/local/lib/python3.8/dist-packages/ru/utils.py", line 145, in fetch_proportion_accepted
    return self.fetch_stop_receiving() / self.fetch_total_reads() * 100
ZeroDivisionError: division by zero
2022-09-09 15:45:35,337 ru.ru_gen 149R/128.16525s

And sometimes it just won't:

[guppy/error] basecall_service::BasecallClient::worker_loop: Connection error. [timed_out] Timeout waiting for reply to request: LOAD_CONFIG

I wonder why, because when I test the connection with:

python -c 'from pyguppy_client_lib.pyclient import PyGuppyClient as PGC; \
           c = PGC("ipc://5557", "dna_r9.4.1_450bps_fast.cfg"); \
           c.connect(); print(c)'

I see that the connection occurs every 8 second.

Would you have an idea on why it does that, also I am a bit surprised that my GPU isn't showing high use even though the basecalling is quite fast...

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because there has been no response for 5 days after becoming stale.