LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
164 stars 31 forks source link

KeyError: >512 when running Readfish 0.0.11a4 on PromethION flowcells #251

Closed jamesemery closed 11 months ago

jamesemery commented 11 months ago

I'm trying to test running Readfish on a PromethION instrument and I have encountered exceptions that eveidently relate to how Readfish is handling flowcells with PromethION numbers of channels even when specified on the command line.

Here is my command and logs:

readfish targets --device 2A --experiment-name "TestReadfishTalking" --cache-size 3000 --batch-size 3000 --channels 1 3000 --toml human_chr_selection.toml 
2023-08-22 18:12:57,233 ru.ru_gen /home/prom/miniconda3/envs/readfish/bin/readfish targets --device 2A --experiment-name TestReadfishTalking --cache-size 3000 --batch-size 3000 --channels 1 3000 --toml human_chr_selection.toml
2023-08-22 18:12:57,233 ru.ru_gen batch_size=3000
2023-08-22 18:12:57,233 ru.ru_gen cache_size=3000
2023-08-22 18:12:57,233 ru.ru_gen channels=[1, 3000]
2023-08-22 18:12:57,233 ru.ru_gen chunk_log=None
2023-08-22 18:12:57,233 ru.ru_gen command=targets
2023-08-22 18:12:57,233 ru.ru_gen device=2A
2023-08-22 18:12:57,233 ru.ru_gen dry_run=False
2023-08-22 18:12:57,233 ru.ru_gen experiment_name=TestReadfishTalking
2023-08-22 18:12:57,233 ru.ru_gen func=<function run at 0x7f74606f8820>
2023-08-22 18:12:57,233 ru.ru_gen host=127.0.0.1
2023-08-22 18:12:57,233 ru.ru_gen log_file=None
2023-08-22 18:12:57,233 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2023-08-22 18:12:57,233 ru.ru_gen log_level=info
2023-08-22 18:12:57,233 ru.ru_gen max_unblock_read_length_seconds=5
2023-08-22 18:12:57,233 ru.ru_gen paf_log=None
2023-08-22 18:12:57,233 ru.ru_gen port=None
2023-08-22 18:12:57,233 ru.ru_gen run_time=172800
2023-08-22 18:12:57,233 ru.ru_gen throttle=0.4
2023-08-22 18:12:57,233 ru.ru_gen toml=human_chr_selection.toml
2023-08-22 18:12:57,234 ru.ru_gen unblock_duration=0.1
2023-08-22 18:12:57,234 ru.ru_gen workers=1
2023-08-22 18:12:57,238 ru.ru_gen Initialising minimap2 mapper
2023-08-22 18:13:09,800 ru.ru_gen Mapper initialised
2023-08-22 18:13:10,224 ru.ru_gen This experiment has 1 region on the flowcell
2023-08-22 18:13:10,225 ru.ru_gen Using reference: /data/hg38.mmi
2023-08-22 18:13:23,927 ru.ru_gen Region 'select_chr_21_22' (control=False) has 2 contigs of which 2 are in the reference. There are 4 targets (including +/- strand) representing 3.04% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
Traceback (most recent call last):
  File "/home/prom/miniconda3/envs/readfish/bin/readfish", line 8, in <module>
    sys.exit(main())
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 507, in run
    simple_analysis(
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 293, in simple_analysis
    if conditions[run_info[channel]].control:
KeyError: 2671

You can see that it doesn't like the run_info[channel] when the read coming back. Digging into the code a little more it appears that the method that gets the configurations from the file is not aware of the channels if they are set on the CLI?

        if live_toml_path.is_file():
            # Reload the TOML config from the *_live file
            run_info, conditions, new_reference, _ = get_run_info(
                live_toml_path, flowcell_size
            )

which in turn is defaulting to 512 channels where that method is defined def get_run_info(toml_filepath, num_channels=512, validate=True): For this (simulation) test i'm just using a modified version of your human_chr_selection.toml:

[caller_settings]
config_name = "dna_r10.4.1_e8.2_400bps_5khz_fast.cfg"
host = "ipc:///home/prom/"
port = 5556

[conditions]
reference = "/data/hg38.mmi"

[conditions.0]
name = "select_chr_21_22"
control = false
min_chunks = 0
max_chunks = 12
targets = ["chr21", "chr22"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

Am I missing something about how to get this code working on PromethION flowcells? I have tested in a fork changing that num_channels line to 3000 and found that it seems to fix the exceptions in dummy simulations. However, I notice that there are multiple issues referring to PromethION sequencing experiments with Readfish which makes me think that I am misconfiguring something. Thank you.

alexomics commented 11 months ago

Did you install from the default (dev_staging) branch? I think that is only setup for MinION/GridION at the moment. There's the guppy_6_minknow_5 branch for PromethION, but it's not entirely in sync with the other branch. We're working to reconcile them, but I would start with guppy_6_minknow_5.

If you can, could you share your conda env setup and I can take a look? If you try the guppy_6_minknow_5 branch I can help if needed.

jamesemery commented 11 months ago

Thanks @alexomics for the speedy reply. I have been running off of dev_staging as I was not aware from your documentation that there is a seperate branch that I should be using. What is missing from guppy_6_minknow_5 that I should be worried about? Here is the python environment on the box:

pip list 
Package                   Version
------------------------- -----------
attrs                     23.1.0
biopython                 1.81
certifi                   2023.7.22
cffi                      1.15.1
charset-normalizer        3.2.0
gevent                    21.12.0
greenlet                  1.1.3.post0
grpcio                    1.56.2
idna                      3.4
importlib-resources       6.0.0
jsonschema                4.18.6
jsonschema-specifications 2023.7.1
mappy                     2.26
minknow-api               5.5.2
numpy                     1.24.4
ont-pyguppy-client-lib    6.5.7
packaging                 23.1
pandas                    2.0.3
pip                       23.2.1
pkgutil_resolve_name      1.3.10
protobuf                  3.20.3
pycparser                 2.21
pyRFC3339                 1.1
python-dateutil           2.8.2
pytz                      2023.3
read-until                3.4.1
readfish                  0.0.11a4
referencing               0.30.1
requests                  2.31.0
rpds-py                   0.9.2
setuptools                68.0.0
six                       1.16.0
toml                      0.10.2
tzdata                    2023.3
urllib3                   2.0.4
watchdog                  3.0.0
wheel                     0.41.0
zipp                      3.16.2
zope.event                5.0
zope.interface            6.0
alexomics commented 11 months ago

The main changes are to do with how readfish connects to both MinKNOW and the base caller. It doesn't look like it can automatically be merged, but I'll see if I can make it work

alexomics commented 11 months ago

Okay, so I've taken a quick punt at the rebase and pushed it to issue251. Can you give that a go with a simulation and let me know what happens?

alexomics commented 11 months ago

For a more complete overview, take a look at the commit message here and maybe nuke the conda env and recreate it with:

name: readfish
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.8
  - pip
  - pip:
    - git+https://github.com/nanoporetech/read_until_api@v3.4.1
    - ont-pyguppy-client-lib==6.4.2
    - git+https://github.com/LooseLab/readfish@issue251
jamesemery commented 11 months ago

I'm now seeing the following exception:

Traceback (most recent call last):
  File "/home/prom/miniconda3/envs/readfish/bin/readfish", line 8, in <module>
    sys.exit(main())
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 535, in run
    simple_analysis(
  File "/home/prom/miniconda3/envs/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 164, in simple_analysis
    align_ref=caller_kwargs["align_ref"],
KeyError: 'align_ref'
alexomics commented 11 months ago

Mentioned to in the commit message:

This commit introduces the ability to address PromethION scale flow cells on MinKNOW v5.1.0 and greater. For now, this uses Guppy for alignment and so requires a slightly different scheme for initialisation:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast_prom"
host = "ipc:///tmp/.guppy"
port = 5555
barcode_kits = ["EXP-NBD104"]
align_ref = "/full/path/to/minimap2.mmi"

barcode_kits is only required when barcoding.

jamesemery commented 11 months ago
Package                   Version
------------------------- -----------
attrs                     23.1.0
beautifulsoup4            4.12.2
biopython                 1.76
certifi                   2023.7.22
cffi                      1.15.1
charset-normalizer        3.2.0
gevent                    21.12.0
google                    3.0.0
greenlet                  1.1.3.post0
grpcio                    1.56.2
idna                      3.4
importlib-resources       6.0.0
jsonschema                4.18.6
jsonschema-specifications 2023.7.1
mappy                     2.26
minknow-api               5.5.2
numpy                     1.24.4
ont-pyguppy-client-lib    6.5.7
packaging                 23.1
pandas                    2.0.3
pip                       23.2.1
pkgutil_resolve_name      1.3.10
protobuf                  3.20.3
pycparser                 2.21
pyRFC3339                 1.1
python-dateutil           2.8.2
pytz                      2023.3
read-until                3.4.1
readfish                  0.0.11a5
referencing               0.30.1
requests                  2.31.0
rpds-py                   0.9.2
setuptools                68.0.0
six                       1.16.0
soupsieve                 2.4.1
toml                      0.10.2
tzdata                    2023.3
urllib3                   2.0.4
watchdog                  3.0.0
wheel                     0.41.0
zipp                      3.16.2
zope.event                5.0
zope.interface            6.0

The only environment difference is that i'm on ont-pyguppy-client-lib==6.5.7 since that is the version of Guppy that i'm using here.

alexomics commented 11 months ago

Okay looking good! Is this working with a simulation?

jamesemery commented 11 months ago

I'm seeing an exception when i run it: https://github.com/LooseLab/readfish/issues/251#issuecomment-1688918339

The Toml and settings i'm using are in the first comment

alexomics commented 11 months ago

Have you seen this comment https://github.com/LooseLab/readfish/issues/251#issuecomment-1688921778? You need to change your readfish TOML config

jamesemery commented 11 months ago

I didn't get it right... Now its working! Thank you @alexomics