LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
163 stars 31 forks source link

Dorado plugin #344

Closed mattloose closed 2 months ago

mattloose commented 3 months ago

Create a new dorado plugin to handle the latest dorado_basecall_server version (7.3.9), which only accepts connections from ont-pybasecall-client-lib, deprecating ont-pyguppy-client-lib.

This pull request is to trigger discussion about the best way to switch to the new ONT pybasecaller client.

Rather than introduce this in the guppy plugin I have created a distinct dorado plugin.

This is due to changes in how the reads need to be packaged for the client, which expects some surprising data (e.g sample rate). Given the switch from guppy to dorado at ONT it may make sense in the future for people to be loading a dorado plugin instead of a guppy plugin. Within a few months many users may have forgotten about (or never known about) guppy in the first place.

Related to the above, we need to find a way of providing the sample rate to the basecall client. Currently the base caller has no way of accessing this?

Any suggestions?

Until this is resolved the PR is not fit for merging.

mattloose commented 3 months ago

In addition this request needs to know if it should send read_id or read name - it has not yet been updated to use the read name.

Adoni5 commented 2 months ago

This should close #347 - which will start to appear more as more people update.

I think we should merge #324 before we merge this.

I also think that making this a new plugin is probably the right way to go! It's going to be heavy duplication of the guppy.py plugin - but I suggest we add a deprecation warning when running the guppy plugin - and state in that we will not support that plugin moving forwards, and it is provided as is, with support for dorado only moving forwards.

@mattloose could we take the sample_rate from the caller config?

Adoni5 commented 2 months ago

Uhoh spaggetios

==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.11.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/adoni5/Projects/readfish
configfile: pyproject.toml
testpaths: tests/*test.py, src/readfish
collected 65 items / 1 error                                                                                                                                                                

========================================================================================== ERRORS ===========================================================================================
______________________________________________________________________ ERROR collecting src/readfish/plugins/guppy.py _______________________________________________________________________
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/doctest.py:567: in collect
    module = import_path(
../../miniforge3/envs/readfish/lib/python3.11/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
../../miniforge3/envs/readfish/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1206: in _gcd_import
    ???
<frozen importlib._bootstrap>:1178: in _find_and_load
    ???
<frozen importlib._bootstrap>:1149: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:940: in exec_module
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
src/readfish/plugins/guppy.py:16: in <module>
    from pyguppy_client_lib.helper_functions import package_read
../../miniforge3/envs/readfish/lib/python3.11/site-packages/pyguppy_client_lib/helper_functions.py:17: in <module>
    from pyguppy_client_lib.client_lib import GuppyClient
E   ImportError: NumPy: dtype is already registered
================================================================================== short test summary info ==================================================================================
ERROR src/readfish/plugins/guppy.py - ImportError: NumPy: dtype is already registered
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================================================== 1 error in 0.35s ======================================================================================

Seems installing ont-pybasecall-client-lib 7.3.10 alongside ont-pyguppy-client-lib might not work

mattloose commented 2 months ago

Yes - we need some kind of conditional install? IT's a bit of a pain this for sure.

Adoni5 commented 2 months ago

Needs the documentation updating - this is confirmed working with latest Minknow /Dorado 7.3 plus, and should work with <= 7.2. I've bumped the version to 2024.0.0, should maybe be 2024.1.0 I've moved all tests to test dorado plugin, and added the new ont-pybasecall-client-lib to pyproject.toml Excluded some plugins/files we can't really test from coverage reporting

Final steps:

mattloose commented 2 months ago

Updated to pull sample rate from MinKNOW. But also have found an issue with dorado returning two reads with the same id in the same batch.