LooseLab / Icarust

A fully featured MinKNOW simulator for testing read until experiments.
Mozilla Public License 2.0
16 stars 5 forks source link

Barcoding Kit for Adaptive Sampling with Icarust #12

Closed mccarthyma17 closed 2 months ago

mccarthyma17 commented 5 months ago

Hello!

I wasn't sure where to post this, so feel free to move it if it's in the wrong place! Hello!

I'm trying to simulate barcoded reads with readfish and Icarust and running into an issue. When I check my TOML file with readfish validate it looks fine, but when I start the simulated run I get this error:

2024-04-24 16:30:36,284 readfish command='targets'
2024-04-24 16:30:36,284 readfish debug_log=True
2024-04-24 16:30:36,285 readfish device='Bantersaurus'
2024-04-24 16:30:36,285 readfish dry_run=False
2024-04-24 16:30:36,285 readfish experiment_name='test_barcodes'
2024-04-24 16:30:36,285 readfish host='127.0.0.1'
2024-04-24 16:30:36,285 readfish log_file=None
2024-04-24 16:30:36,285 readfish log_format='%(asctime)s %(name)s %(message)s'
2024-04-24 16:30:36,285 readfish log_level='debug'
2024-04-24 16:30:36,285 readfish max_unblock_read_length_seconds=5
2024-04-24 16:30:36,285 readfish padding=0
2024-04-24 16:30:36,285 readfish port='10000'
2024-04-24 16:30:36,285 readfish throttle=0.4
2024-04-24 16:30:36,285 readfish toml='readfish_e_faecium.toml'
2024-04-24 16:30:36,285 readfish unblock_duration=0.1
2024-04-24 16:30:36,285 readfish wait_for_ready=60
2024-04-24 16:30:36,354 readfish._read_until_client Protocol phase changed to PHASE_SEQUENCING
2024-04-24 16:30:36,354 readfish._read_until_client Protocol state changed to PROTOCOL_RUNNING
2024-04-24 16:30:36,359 readfish.targets eJztWNtu20YQfedXLKSHtEAs7Y1cskAfnNhGEqRJ4CSAA8MgVuRKJsRbSCqXv++QkrhDS5Gbog9FSwEGPLtzPTM7OtBtpNPUVGFtmibJV/VstSnL73fOVMdxZeqa/E4mSRn9Np/PP9amqueZjk2a5CaLIl0199/nF6ZeN0U5L/LmLC4qHRdnoPjFVPNFks9d+EycQ2dfi2pdlzoy81Qv5lnna1Nt/wchY+78PNZlk3wx4XudlRBytXMWFfkyWbW+4lyHFaMzOWOh8Wc8lJQuyjpc6rqZOLFZbFZhWnSqC12brtQ4jO43+bqeLT9PnIWuoiI24TppuuQub96dvXl2wZicOM50Sj68ePmevH/x9uPrC/LskvzxiTy/eHl1Ra4vry6vL988v3RuM12WGL9W/h5W9Z2zzMMk/hYmOXh+stQmSjZZyASdZVnyxJkOriGzsDLL7VUeNveV0XGbkoQ8bm8rs0qKvL6DvuQ6M22qV1uH5LpVnDjTLMl3lcEtA1l/s7J0pjVkl5qwaMNNamgYxItM8gWOW+tN2iQ/utybLpft9SZfpEW0tkYPj/MC0PjcnpVVERkTb88AmMHZ7Q78erbJo1TXdbJMTHzn7AvEp2G1rXJQJCVTAgdJBijkm2xhKlIsye56YZZFZYgmMcBUA3gk0jmcknZ+nSE4rR/97bifpiCb2hDoUu8o02sAhJwRvWxAt7lPagIjAPEyECDmooChRSF01LR2v8RmqQEysoPqV2fQkz2AhEA+51sTiN7oNYRftp6hHghV5G0RZdnm0KroamWamYM7iFyd9pW1GDX3gAzy+pR8TZp7ohuSGnhI3dU+yvFR+ImMdR6TpOlOwcfWrXN0kE7mruGvsyrT3ncNiW8v07RTsBFmzuFQHnefFwT0NiaPDIl1ox07uaer3eR6AbnACag7BzPwqH1nQazFzFmYtPgaDkb+keQ7C2ItZg56Zcfe2OELO7kmTm2JHyyJ4zviwTt++B6PdOtYH3B1u38os8X1Rw/jHYTbDkkr3JLJ83eMBr4IZmzylExeXzPhSiW3UnenAh/dMYU0pRBqr0mFzxTFml7Q3wWKcqsZMM5EL3mucnkvKe4HNrpQUtk77lLVe2GMCo9ayYPUttKr87c3NzeU0e4zcO0zZEC5wIE8hpJXPh/c+YOUvK10/o6yQOzTff0mCAJfykFKA0l4fS6+6IHcguz2EahUgveaTMItgkBJW1Hrk1s7qC5w7Z3vSh+DrKhtlQxkMMBFDtqIsqbctoNyX+3b2LWf2nhd9B5dCt/7FI8UR/V5HmcurkGKwYDZeC24yKfP9tUC8txzPYsnlMvxmMp9/zpNj+68vP1ImRtID+UipR1FPxDMdgWieRxpCj9AuVCF50VQO5iC+v0sfXADHyxRH1yOJEn3fdjm6XmDrC0SLAhQ/3wBM4I0leI4umtrkJIrGw8qCOxkCUYlsmMesxEUd9F8Mngc9vlCSQwhCGBbXLgruZ0JESiOloDkAi8I0EXRGbWdDqAKhiN4aHqEJ2xFbX0S+wzwLMG82lxcz8WaLFCDrihsZ98fFYJT1BVf2tntaueoK7DYrOSiRda9v0GEwcwLMZgltPJgOjHWnrD9g7XgohXLlEJ7IoA3ZhFUsBjQfoHFZ71w5SKsJSxVH6PrDTFjA8nDL064WOI4XmBfR7tt1PBOIslHswSbyBsg6A2/QuyLkxx3hTPfd9FXwafDrwIwpwhIRRkK5HOJVhAMJh2E7aS7f5o5kB+wwtPcoOdHx7gBP+QG/HEucqoucqqwkVeMvGLkFX0uI6/YSyOveDryipFX/CSv+Es/LBz5nfEIXzjCDcQdOSAH4pEfDv4+5RmJwUgMRmIwEoORGKCujMRgJAb/PmIgD380kCMvGHnByAtGXjDygpEX9DM48oL/FS9wD3mBO/KCkReMvGDkBSMvGHlBP4MjL/jv8gLi/AnmztIJ
2024-04-24 16:30:36,391 readfish.targets Configuration description:
Number of barcodes in the Conf (excluding unclassified and classified): 5
Barcode unclassified_reads (control=False), Barcode classified_reads (control=False), Barcode barcode01 (control=False), Barcode barcode02 (control=False), Barcode barcode03 (control=False), Barcode barcode04 (control=False) and Barcode barcode05 (control=False)

2024-04-24 16:30:36,393 readfish.targets Fetching Run Configuration
2024-04-24 16:30:36,394 readfish.targets Run Configuration Received
2024-04-24 16:30:36,394 readfish.targets run_id=72eabaadba8e419490db7d020ac9a553
2024-04-24 16:30:36,394 readfish.targets break_reads_after_seconds=1.0
2024-04-24 16:30:36,397 readfish.targets Initialising Caller
Traceback (most recent call last):
File ".conda/envs/BBC_conda/envs/readfish/lib/python3.10/site-packages/readfish/plugins/guppy.py", line 162, in validate
line 162, in validate
    raise RuntimeError(
RuntimeError: Barcoding kits specified in TOML EXP-NBD114 not amongst those supported by the selected kit and protocol.
Supported kits are:

The error never prints any supporting kits - is there a specific kit I can/should use for simulated barcoding?

Icarust is simulating barcodes without any issue but I can't find any kit information.

[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 307 ] Fetching barcode squiggle for barcode Barcode01 at static/barcode_squiggle/Barcode01_1_R10.squiggle.npy
[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 307 ] Fetching barcode squiggle for barcode Barcode02 at static/barcode_squiggle/Barcode02_1_R10.squiggle.npy
[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 307 ] Fetching barcode squiggle for barcode Barcode03 at static/barcode_squiggle/Barcode03_1_R10.squiggle.npy
[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 307 ] Fetching barcode squiggle for barcode Barcode04 at static/barcode_squiggle/Barcode04_1_R10.squiggle.npy
[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 307 ] Fetching barcode squiggle for barcode Barcode05 at static/barcode_squiggle/Barcode05_1_R10.squiggle.npy
[2024-04-24T16:47:52Z INFO icarust::impl_services::data: 1414 ] Barcodes available [
    "Barcode03",
    "Barcode02",
    "Barcode01",
    "Barcode05",
    "Barcode04",
]

Thanks for all your help!

Adoni5 commented 5 months ago

This is a great spot! If possible, could you let me know the contents of your simulation profile toml, and which version of guppy/dorado you are running? And are you running with docker or natively?

readfish is smart and checks with the base-caller for supported barcoding kits for the chemistry and I haven't' implemented the mock end point in Icarust correctly that we query to get the supported barcoding kits! This would explain why there is no values in the provided kits!

Thanks Rory

mccarthyma17 commented 5 months ago

Hi Rory,

Thanks for the quick response! I'm running Icarust natively with dorado version 7.2.13.

Here's my simulation profile:

output_path = "."
target_yield = 100000000000
pore_type = "R10"
nucleotide_type = "DNA"

[parameters]
sample_name = "five_barcodes_efaecium"
experiment_name = "test_barcoding"
flowcell_name = "FAQ1234"
experiment_duration_set = 4800
device_id = "Bantersaurus"
position = "FenceSitter"
sample_rate = 4000

[[sample]]
name = "GCF_00214"
input_genome = "GCF_002140315.1_ASM214031v1_genomic.fna"
barcodes = ['Barcode01']
mean_read_length = 3000
weight = 1

[[sample]]
name = "GCF_0029"
input_genome = "GCF_002973795.1_ASM297379v1_genomic.fna"
barcodes = ['Barcode02']
mean_read_length = 3000
weight = 1

[[sample]]
name = "GCF_00332"
input_genome = "GCF_003320215.1_ASM332021v1_genomic.fna"
barcodes = ['Barcode03']
mean_read_length = 3000
weight = 1

[[sample]]
name = "GCF_00095"
input_genome = "GCF_000951815.1_ASM95181v1_genomic.fna"
barcodes = ['Barcode04']
mean_read_length = 3000
weight = 1

[[sample]]
name = "GCF_0225"
input_genome = "GCF_022509525.1_ASM2250952v1_genomic.fna"
barcodes = ['Barcode05']
mean_read_length = 3000
weight = 1

Thanks!

Adoni5 commented 5 months ago

Hi @mccarthyma17 - there's a new branch up which should fix this!

The PR is here if you want to see the changes, the way you run icarust should remain the same.

Let me know if this addresses your issue! Cheers, Rory

Adoni5 commented 2 months ago

This has been merged into main, and should be fully compatible with readfish 2024.2.0 Closing this for now, but please reopen if you still have the same issue!

Many thanks, Rory