goldman-gp-ebi / BOSS-RUNS

Dynamic, adaptive sampling during nanopore sequencing
GNU General Public License v3.0
26 stars 5 forks source link

BOSS-RUNS does not make Readfish eject reads #1

Closed bnoordijk closed 1 year ago

bnoordijk commented 1 year ago

Hi!

We've used Readfish in the past to eject unwanted reads; with great success! Now we want to start using BOSS-RUNS to prevent a coverage bias in our sample. We've configured it according to documentation, and ran it successfully on the simulated human read set.

However, when we run it on a real sequencing sample, we're not sure if BOSS-RUNS ever tells readfish to eject any reads. We compare it to one control zone, and sequence on a MinION flowcell.

Our toml looks like this:

[caller_settings]
config_name = "dna_r9.4.1_450bps_hac"
host = "ipc:///tmp/.guppy"
port = 5555

[conditions]
reference = # Removed for confidentiality

[conditions.0]
name = "select_species"
control = false
min_chunks = 0
max_chunks = 12
targets = ["All headers in the reference"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"
mask = "bossruns_select_species/masks"

[conditions.1]
name = "control"
control = true
min_chunks = 0
max_chunks = inf
targets = []
single_on = "proceed"
multi_on = "proceed"
single_off = "proceed"
multi_off = "proceed"
no_seq = "proceed"
no_map = "proceed"

We run readfish and bossruns with the following commands:

readfish boss-runs --experiment-name 78_BOSS-RUNS_test --device MN35977 \
    --toml toml_configs/bossruns_one_species.toml \
    --log-file /logs/boss_runs_real_test/log.log \
    --paf-log /logs/boss_runs_real_test/paf.log \
    --chunk-log /logs/boss_runs_real_test/chunk.log \
    --log-level debug \

and

python BOSS-RUNS/bossruns.py --run_name select_species \
      --ref PATH_TO_REF \
      --ref_idx PATH_TO_INDEX\
      --device MN35977 \
      --conditions

Each command is run from within a docker container, but the explicit command which mounts all the relevant volumes is not shown here.

It seems like boss-runs can update the masks dynamically; see this excerpt from our BOSS-RUNS log:

 Next batch ---------------------------- # 1
2023-04-06 09:34:15,184 found 1 new fq files:
 {'/var/lib/minknow/data/./78_BOSS-RUNS_test/no_sample/20230406_1020_MN35977_FAV81623_189be7d2/fastq_pass/FAV81623_pass_189be7d2_67e952ab_2.fastq.gz'}
2023-04-06 09:34:15,185 reading file: /var/lib/minknow/data/./78_BOSS-RUNS_test/no_sample/20230406_1020_MN35977_FAV81623_189be7d2/fastq_pass/FAV81623_pass_189be7d2_67e952ab_2.fastq.gz
2023-04-06 09:34:15,361 processing 1956 reads in this batch
2023-04-06 09:34:15,596 Number of mapped reads: 1855, unmapped reads: 101
2023-04-06 09:34:16,015 Counts and rel. proportions of observed reads:
2023-04-06 09:34:16,015 CLASSIFIED_HEADER_NAME_25.1: 2789 0.467
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_26.1: 46 0.008
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_28.1: 573 0.096
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_29.1: 662 0.111
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_30.1: 499 0.084
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_31.1: 348 0.058
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_32.1: 231 0.039
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_33.1: 75 0.013
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_34.1: 359 0.06
2023-04-06 09:34:16,016 CLASSIFIED_HEADER_NAME_35.1: 90 0.015
2023-04-06 09:34:16,017 updating scores
2023-04-06 09:34:16,439 switch count: off 252; on 16
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_25.1: 28795, 28752; 0.9549946935526665, 0.9535685858317856
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_28.1: 7577, 7577; 0.9742831425999743, 0.9742831425999743
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_29.1: 4933, 4933; 0.9610364309370738, 0.9610364309370738
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_30.1: 3463, 3463; 0.8964535335231685, 0.8964535335231685
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_31.1: 2124, 2124; 0.8415213946117274, 0.8415213946117274
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_32.1: 1242, 1242; 0.8613037447988904, 0.8613037447988904
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_34.1: 906, 906; 0.6937212863705973, 0.6937212863705973
2023-04-06 09:34:16,629 batch took: 1.4450161457061768
2023-04-06 09:34:16,629 finished updating masks, waiting for 88 ...

2023-04-06 09:35:44,716

Readfish also successfully reloads the mask dict:

2023-04-06 09:38:13,095 ru.ru_gen_boss_runs /usr/local/bin/readfish boss-runs --experiment-name 78_BOSS-RUNS_test --device MN35977 --toml toml_configs/bossruns_one_species.toml --log-file /logs/boss_runs_real_test/log.log --paf-log /logs/boss_runs_real_test/paf.log --chunk-log /logs/boss_runs_real_test/chunk.log --log-level debug
2023-04-06 09:38:13,095 ru.ru_gen_boss_runs batch_size=512
2023-04-06 09:38:13,095 ru.ru_gen_boss_runs cache_size=512
2023-04-06 09:38:13,095 ru.ru_gen_boss_runs channels=[1, 512]
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs chunk_log=/logs/boss_runs_real_test/chunk.log
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs command=boss-runs
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs device=MN35977
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs dry_run=False
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs experiment_name=78_BOSS-RUNS_test
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs func=<function run at 0x7f1e1f508940>
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs host=127.0.0.1
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs log_file=/logs/boss_runs_real_test/log.log
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs log_format=%(asctime)s %(name)s %(message)s
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs log_level=debug
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs mask=None
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs paf_log=/logs/boss_runs_real_test/paf.log
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs port=None
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs run_time=172800
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs throttle=0.4
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs toml=toml_configs/bossruns_one_species.toml
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs unblock_duration=0.1
2023-04-06 09:38:13,096 ru.ru_gen_boss_runs workers=1
2023-04-06 09:38:13,097 ru.ru_gen_boss_runs Initialising minimap2 mapper
2023-04-06 09:38:13,141 ru.ru_gen_boss_runs Mapper initialised
2023-04-06 09:38:13,181 ru.ru_gen_boss_runs This experiment has 2 regions on the flowcell
2023-04-06 09:38:13,181 ru.ru_gen_boss_runs Using reference: REFERENCEPATH
2023-04-06 09:38:13,198 ru.ru_gen_boss_runs Region 'select_species' (control=False) has 11 contigs of which 11 are in the reference. There are 22 targets (including +/- strand) representing 100.0% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
2023-04-06 09:38:13,199 ru.ru_gen_boss_runs Region 'control' (control=True) has 0 contigs of which 0 are in the reference. There are 0 targets (including +/- strand) representing 0.0% of the reference. Reads will be unblocked when classed as ; sequenced when classed as ; and polled for more data when classed as single_on, single_off, multi_on, multi_off, no_map or no_seq.
2023-04-06 09:38:13,206 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:38:13,507 ru.ru_gen_boss_runs 8R/0.30348s
2023-04-06 09:38:14,107 ru.ru_gen_boss_runs 228R/0.50375s
2023-04-06 09:38:14,607 ru.ru_gen_boss_runs 248R/0.49977s
2023-04-06 09:38:14,943 ru.ru_gen_boss_runs 75R/0.33552s
2023-04-06 09:38:15,317 ru.ru_gen_boss_runs 43R/0.30919s
2023-04-06 09:38:15,703 ru.ru_gen_boss_runs 17R/0.29500s

Excerpt from the readfish chunklog. Readfish only eject reads because they exceed max_chunks it seems.

client_iteration        read_in_loop    read_id channel read_number     seq_len counter mode    decision        condition min_threshold   count_threshold start_analysis  end_analysis    timestamp
15      22      b08f33d4-d85e-4095-8026-9bed4def9930    3       399     366     2       control True    control False   False   230324.943320849           230324.943321864        1680773352.5080986
15      23      f7e55c14-9532-4102-93c6-cbe202ecb339    481     469     165     1       control True    control False   False   230324.943338875           230324.943342193        1680773352.508119
15      24      6fc12bf7-2538-4b1e-a08a-f0533e89da66    378     377     62      1       control True    control False   False   230324.943355887           230324.943357493        1680773352.5081341
15      25      e01cdcd1-c6fd-4f0d-8353-5d16b903b782    372     369     282     1       control True    control False   False   230324.943412341           230324.943413571        1680773352.5081902
15      26      a493188e-9b9d-4153-a44c-dc8b59f8fe15    445     333     129     2       control True    control False   False   230324.94343187 230324.943433089   1680773352.5082097
15      27      8c584b64-5e6e-4cf1-93b2-6d99f6e102da    313     421     164     1       single_on       stop_receiving  select_species  False   False      230324.943469059        230324.943486579        1680773352.5082633
15      28      f210a2a0-bd99-4470-9b0a-be236b102f08    278     357     165     1       single_on       stop_receiving  select_species  False   False      230324.943533912        230324.943550939        1680773352.5083277
15      29      ad6170a9-53e3-48f2-ae05-227238d31fd2    279     247     165     1       single_on       stop_receiving  select_species  False   False      230324.943592227        230324.943610693        1680773352.5083873
15      30      7d50d0bb-f977-484f-b310-f2008550c349    113     448     2030    12      exceeded_max_chunks_unblocked   exceeded_max_chunks_unblocked      select_species  False   True    230324.943693817        230324.943698684        1680773352.5084753
16      1       df0ac577-179f-4502-a10d-de556e61912b    50      394     126     1       control True    control False   False   230325.353143074           230325.353156241        1680773352.9179335
16      2       3620fb2a-7c4b-43f0-9d9d-12c032d28a8d    95      289     457     3       single_on       stop_receiving  select_species  False   False      230325.353388168        230325.353458541        1680773352.9182353
16      3       7d50d0bb-f977-484f-b310-f2008550c349    113     448     2239    13      exceeded_max_chunks_unblocked   exceeded_max_chunks_unblocked      select_species  False   True    230325.353624299        230325.353632312        1680773352.918409
16      4       81120ed3-77ca-4474-9df6-8672c636e977    202     399     353     1       single_on       stop_receiving  select_species  False   False      230325.353760788        230325.353806663        1680773352.9185836
16      5       ad6170a9-53e3-48f2-ae05-227238d31fd2    279     247     325     2       single_on       stop_receiving  select_species  False   False      230325.353918867        230325.353960111        1680773352.918737
16      6       ff9ef4b0-64e8-4092-9395-3ab5ab5a9bc9    306     456     152     1       single_on       stop_receiving  select_species  False   False      230325.354032447        230325.35407178 1680773352.9188485
16      7       f7e55c14-9532-4102-93c6-cbe202ecb339    481     469     366     2       control True    control False   False   230325.354118899           230325.354122088        1680773352.918899
16      8       28092d92-77a3-4e6b-a6f6-73abf2bfe15e    475     310     71      1       control True    control False   False   230325.354154722           230325.354158332        1680773352.9189353
16      9       a77d0dce-3dae-4427-bc1f-8d7a52801c30    16      450     156     1       control True    control False   False   230325.354221983           230325.354225356        1680773352.919002

How can we find out when Readfish ejects reads because of something to do with BOSS-RUNS?

The sample we sequence consists mainly of one species. We use readfish because we want more even coverage across our genome. But the results of our experiment do not indicate any difference in coverage. Do you have any ideas what could cause our issue?

We're happy to provide more information if you need any.

Kind regards, Ben

--EDIT-- Do you maybe think it has something to do with BOSS-RUNS detecting many dropouts? An example can be seen here:

2023-04-06 09:56:17,088 processing 1989 reads in this batch
2023-04-06 09:56:17,351 Number of mapped reads: 1897, unmapped reads: 92
2023-04-06 09:56:17,767 Counts and rel. proportions of observed reads:
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_25.1: 10407 0.472
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_26.1: 153 0.007
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_28.1: 2156 0.098
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_29.1: 2431 0.11
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_30.1: 1874 0.085
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_31.1: 1279 0.058
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_32.1: 774 0.035
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_33.1: 268 0.012
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_34.1: 1311 0.06
2023-04-06 09:56:17,767 CLASSIFIED_HEADER_NAME_35.1: 326 0.015
2023-04-06 09:56:17,769 updating scores
2023-04-06 09:56:19,281 detected 398367 dropouts
2023-04-06 09:56:19,305 detected 35191 dropouts
2023-04-06 09:56:19,311 detected 12097 dropouts
2023-04-06 09:56:19,315 detected 42030 dropouts
2023-04-06 09:56:19,318 detected 37454 dropouts
2023-04-06 09:56:19,322 detected 4157 dropouts
2023-04-06 09:56:19,424 switch count: off 133; on 135
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_25.1: 16673, 16713; 0.5529649774475989, 0.5542915892809764
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_28.1: 6078, 5477; 0.7815352963867815, 0.7042561398997043
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_29.1: 2167, 1924; 0.42217027079680497, 0.3748295343853497
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_30.1: 1354, 1319; 0.35050478902407456, 0.3414444732073518
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_33.1: 1202, 1176; 0.873546511627907, 0.8546511627906976
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 09:56:19,615 batch took: 2.724835157394409
2023-04-06 09:56:19,615 finished updating masks, waiting for 87 ...
W-L commented 1 year ago

Hi Ben, Thanks for your efforts and for the detailed logs, that's very helpful. From a first look it seems that it should be working fine. As you say, BOSS-RUNS creates new masks and readfish picks them up as well. At least at the very beginning of the sequencing run. Do you see any additional Reloaded mask dict for.. in the readfish log? That would indicate that the masks from BOSS-RUNS very successfully replaced by updated ones. I'm asking since it seems that the masks that BOSS-RUNS produces instruct quite a few reads to be rejected. E.g. from what I can see, reads mapping to the reference CLASSIFIED_HEADER_NAME_25.1 make up about 47% of your sample, and at the time of the second log you posted as EDIT, ~55% of the sites are classed as "to be accepted" if a read starts there. So if this single sequence makes up ~50% of the sample and we reject ~50% of it, we should see at least ~25% of total reads getting rejected. In readfish's chunk.log these should appear mostly as single_off unblock instructions, as you probably expected. The part of the chunk.log you posted seems to be the very top of the file from the start of the experiment. Did you check if any single_off unblock select_species were recorded later on? You can grep for them with grep -P "single_off\tunblock\tselect_species" chunk.log, since they would appear just like any other read unblocked by readfish. If using BOSS-RUNS, readfish has no other masks that it works off of, so there is no special tag or anything to distinguish which reads were rejected because of BOSS-RUNS. Besides that, what is the distribution of read lengths in your library? Did you see a peak of reads being rejected? Or any difference between the reads from the control and adaptive sampling sectors on the flowcell? Thanks!

bnoordijk commented 1 year ago

Hi W-L,

Thanks for the quick reply, really appreciate it!

  1. Indeed readfish picks up the new masks. This is the output of grep "dict" log.log (Omitting everything past 10:15 to save some space).
    2023-04-06 09:38:13,206 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:38:19,190 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:38:23,994 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:39:50,045 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:41:19,300 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:44:19,080 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:47:18,989 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:50:18,803 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:53:19,067 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:56:20,054 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 09:57:49,736 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:00:49,656 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:03:48,466 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:06:48,286 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:09:48,888 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:11:18,161 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
    2023-04-06 10:14:17,283 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])

Also, we can see that BOSS-RUNS changes the acceptance rates of various contigs over time:

grep -A 9 "switch" select_species.bossruns.log

2023-04-06 09:32:34,699 initialising strategy switches
2023-04-06 09:32:34,883 initialising phi
2023-04-06 09:32:34,884 initializing priors
2023-04-06 09:32:34,884 initialising prior read length distribution
2023-04-06 09:32:34,887 initialising positional scores
2023-04-06 09:32:34,931 total score is: 226784.95549006073
2023-04-06 09:32:35,412 initialising fist strategy
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:32:45,318 initialising strategy switches
2023-04-06 09:32:45,503 initialising phi
2023-04-06 09:32:45,503 initializing priors
2023-04-06 09:32:45,503 initialising prior read length distribution
2023-04-06 09:32:45,506 initialising positional scores
2023-04-06 09:32:45,551 total score is: 226784.95549006073
2023-04-06 09:32:46,028 initialising fist strategy
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:32:48,875 switch count: off 266; on 2
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_25.1: 29952, 29952; 0.9933669408331123, 0.9933669408331123
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_30.1: 3863, 3863; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_31.1: 2524, 2524; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_32.1: 1442, 1442; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_34.1: 1106, 1106; 0.8468606431852986, 0.8468606431852986
2023-04-06 09:32:50,101 batch took: 4.023870229721069
--
2023-04-06 09:34:16,439 switch count: off 252; on 16
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_25.1: 28795, 28752; 0.9549946935526665, 0.9535685858317856
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_28.1: 7577, 7577; 0.9742831425999743, 0.9742831425999743
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_29.1: 4933, 4933; 0.9610364309370738, 0.9610364309370738
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_30.1: 3463, 3463; 0.8964535335231685, 0.8964535335231685
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_31.1: 2124, 2124; 0.8415213946117274, 0.8415213946117274
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_32.1: 1242, 1242; 0.8613037447988904, 0.8613037447988904
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_34.1: 906, 906; 0.6937212863705973, 0.6937212863705973
2023-04-06 09:34:16,629 batch took: 1.4450161457061768
--
2023-04-06 09:37:16,085 switch count: off 230; on 38
2023-04-06 09:37:16,274 CLASSIFIED_HEADER_NAME_25.1: 27215, 27215; 0.9025935261342531, 0.9025935261342531
2023-04-06 09:37:16,274 CLASSIFIED_HEADER_NAME_28.1: 7125, 7043; 0.9161630448759162, 0.9056191333419056
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_29.1: 3743, 3733; 0.7292031950126632, 0.7272550165595169
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_30.1: 3484, 3463; 0.9018897230132021, 0.8964535335231685
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_31.1: 1924, 1925; 0.7622820919175911, 0.7626782884310618
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_32.1: 1042, 1042; 0.7226074895977809, 0.7226074895977809
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_34.1: 409, 400; 0.31316998468606433, 0.30627871362940273
2023-04-06 09:37:16,275 batch took: 1.468853235244751
--
2023-04-06 09:38:18,395 initialising strategy switches
2023-04-06 09:38:18,592 initialising phi
2023-04-06 09:38:18,593 initializing priors
2023-04-06 09:38:18,593 initialising prior read length distribution
2023-04-06 09:38:18,596 initialising positional scores
2023-04-06 09:38:18,644 total score is: 226784.95549006073
2023-04-06 09:38:19,144 initialising fist strategy
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:38:23,690 switch count: off 230; on 38
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_25.1: 27219, 27219; 0.9027261873175909, 0.9027261873175909
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_28.1: 7134, 7041; 0.9173203034589174, 0.9053619647679053
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_29.1: 3747, 3733; 0.7299824663939217, 0.7272550165595169
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_30.1: 3484, 3463; 0.9018897230132021, 0.8964535335231685
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_31.1: 1926, 1924; 0.7630744849445324, 0.7622820919175911
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_32.1: 1042, 1042; 0.7226074895977809, 0.7226074895977809
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_34.1: 411, 400; 0.31470137825421135, 0.30627871362940273
2023-04-06 09:38:25,211 batch took: 6.026308059692383
--
2023-04-06 09:39:49,548 switch count: off 210; on 58
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_25.1: 24836, 24808; 0.8236932873441231, 0.8227646590607588
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_28.1: 7149, 7035; 0.9192490677639192, 0.9045904590459046
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_29.1: 3507, 3416; 0.6832261835184102, 0.6654977595947789
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_30.1: 3286, 3242; 0.8506342221071705, 0.8392441107947192
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_31.1: 1524, 1524; 0.6038034865293186, 0.6038034865293186
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_32.1: 842, 842; 0.5839112343966713, 0.5839112343966713
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_33.1: 1209, 1176; 0.8786337209302325, 0.8546511627906976
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_34.1: 400, 400; 0.30627871362940273, 0.30627871362940273
2023-04-06 09:39:49,736 batch took: 1.4432902336120605
--
2023-04-06 09:41:19,099 switch count: off 193; on 75
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_25.1: 23588, 23522; 0.7823029981427434, 0.7801140886176705
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_28.1: 6946, 6771; 0.8931464575028931, 0.8706442072778706
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_29.1: 3493, 3411; 0.6804987336840055, 0.6645236703682057
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_30.1: 2286, 2240; 0.5917680559150919, 0.5798602122702563
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_31.1: 1229, 1137; 0.4869255150554675, 0.45047543581616484
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_32.1: 748, 659; 0.5187239944521498, 0.45700416088765605
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_33.1: 1209, 1176; 0.8786337209302325, 0.8546511627906976
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:41:19,289 batch took: 1.4642820358276367
--
2023-04-06 09:44:18,770 switch count: off 178; on 90
2023-04-06 09:44:18,959 CLASSIFIED_HEADER_NAME_25.1: 22197, 22215; 0.736170071637039, 0.7367670469620589
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_28.1: 6605, 6280; 0.8492992156358493, 0.8075093223608075
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_29.1: 3251, 3197; 0.6333528151178648, 0.6228326514708747
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_30.1: 1838, 1657; 0.47579601346104067, 0.4289412373802744
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_31.1: 1164, 1134; 0.4611727416798732, 0.44928684627575277
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_32.1: 414, 400; 0.2871012482662968, 0.27739251040221913
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_33.1: 1179, 1176; 0.8568313953488372, 0.8546511627906976
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:44:18,961 batch took: 1.4967708587646484
--
2023-04-06 09:47:18,592 switch count: off 168; on 100
2023-04-06 09:47:18,780 CLASSIFIED_HEADER_NAME_25.1: 21299, 21090; 0.7063876359777129, 0.6994560891483153
2023-04-06 09:47:18,780 CLASSIFIED_HEADER_NAME_28.1: 6248, 5892; 0.8033946251768034, 0.7576186190047576
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_29.1: 2733, 2733; 0.532437171244886, 0.532437171244886
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_30.1: 1614, 1619; 0.417809992234015, 0.4191043230649754
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_32.1: 413, 400; 0.28640776699029125, 0.27739251040221913
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_33.1: 1177, 1176; 0.8553779069767442, 0.8546511627906976
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:47:18,781 batch took: 1.643306016921997
--
2023-04-06 09:50:18,557 switch count: off 159; on 109
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_25.1: 19868, 19737; 0.6589280976386309, 0.6545834438843194
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_28.1: 5974, 5687; 0.7681625305387682, 0.7312588401697313
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_29.1: 2533, 2533; 0.49347360218195985, 0.49347360218195985
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_30.1: 1604, 1600; 0.4152213305720942, 0.4141858659073259
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_32.1: 416, 400; 0.2884882108183079, 0.27739251040221913
2023-04-06 09:50:18,747 CLASSIFIED_HEADER_NAME_33.1: 1176, 1176; 0.8546511627906976, 0.8546511627906976
2023-04-06 09:50:18,747 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:50:18,748 batch took: 1.7912256717681885
--
2023-04-06 09:53:18,525 switch count: off 144; on 124
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_25.1: 17850, 17869; 0.5920005306447333, 0.5926306712655877
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_28.1: 6031, 5520; 0.7754918348977755, 0.7097852642407098
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_29.1: 2353, 2201; 0.4584063900253263, 0.4287940775375024
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_30.1: 1337, 1294; 0.34610406419880924, 0.33497281905254983
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_33.1: 1197, 1176; 0.8699127906976745, 0.8546511627906976
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:53:18,715 batch took: 1.7903025150299072
--
2023-04-06 09:56:19,424 switch count: off 133; on 135
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_25.1: 16673, 16713; 0.5529649774475989, 0.5542915892809764
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_28.1: 6078, 5477; 0.7815352963867815, 0.7042561398997043
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_29.1: 2167, 1924; 0.42217027079680497, 0.3748295343853497
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_30.1: 1354, 1319; 0.35050478902407456, 0.3414444732073518
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_33.1: 1202, 1176; 0.873546511627907, 0.8546511627906976
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 09:56:19,615 batch took: 2.724835157394409
--
2023-04-06 09:57:49,152 switch count: off 126; on 142
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_25.1: 16690, 16553; 0.5535287874767842, 0.5489851419474662
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_28.1: 5807, 5300; 0.7466889546097467, 0.6814967211006815
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_29.1: 1763, 1733; 0.34346386128969414, 0.33761932593025523
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_30.1: 1207, 1207; 0.312451462593839, 0.312451462593839
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_33.1: 1260, 1123; 0.9156976744186046, 0.8161337209302325
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 09:57:49,341 batch took: 2.6383650302886963
--
2023-04-06 10:00:49,310 switch count: off 121; on 147
2023-04-06 10:00:49,495 CLASSIFIED_HEADER_NAME_25.1: 16350, 16249; 0.5422525868930751, 0.5389028920137967
2023-04-06 10:00:49,495 CLASSIFIED_HEADER_NAME_28.1: 5613, 5063; 0.7217436029317218, 0.6510222450816511
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_29.1: 1377, 1366; 0.2682641729982466, 0.2661211766997857
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_30.1: 1007, 1007; 0.26067822935542323, 0.26067822935542323
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_33.1: 1255, 1123; 0.9120639534883721, 0.8161337209302325
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:00:49,496 batch took: 3.032994031906128
--
2023-04-06 10:03:48,264 switch count: off 114; on 154
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_25.1: 15728, 15581; 0.5216237728840541, 0.5167484743963916
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_28.1: 5547, 5069; 0.7132570399897132, 0.6517937508036518
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_29.1: 1379, 1368; 0.2686538086888759, 0.26651081239041496
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_30.1: 1017, 1016; 0.263266891017344, 0.26300802485115193
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_31.1: 1000, 1000; 0.39619651347068147, 0.39619651347068147
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_33.1: 1023, 1040; 0.7434593023255814, 0.7558139534883721
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:03:50,339 batch took: 4.670608282089233
--
2023-04-06 10:06:48,012 switch count: off 107; on 161
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_25.1: 13507, 13408; 0.44796365083576545, 0.444680286548156
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_28.1: 5311, 4856; 0.682911148257683, 0.6244052976726244
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_29.1: 1386, 1368; 0.27001753360607833, 0.26651081239041496
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_30.1: 1007, 1006; 0.26067822935542323, 0.26041936318923115
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_31.1: 811, 800; 0.3213153724247227, 0.31695721077654515
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_33.1: 1023, 1037; 0.7434593023255814, 0.7536337209302325
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:06:48,198 batch took: 2.6857783794403076
--
2023-04-06 10:09:48,410 switch count: off 100; on 168
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_25.1: 12925, 12781; 0.42866144866012207, 0.4238856460599629
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_28.1: 5072, 4658; 0.6521795036646522, 0.598945608846599
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_29.1: 1165, 1168; 0.2269627897915449, 0.2275472433274888
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_30.1: 608, 608; 0.15739062904478385, 0.15739062904478385
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_31.1: 800, 800; 0.31695721077654515, 0.31695721077654515
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_33.1: 1023, 1032; 0.7434593023255814, 0.75
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:09:48,594 batch took: 3.220763921737671
--
2023-04-06 10:11:17,751 switch count: off 98; on 170
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_25.1: 13460, 13411; 0.4464048819315468, 0.4447797824356593
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_28.1: 5494, 4989; 0.7064420727787064, 0.6415070078436415
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_29.1: 1193, 1159; 0.23241768946035457, 0.2257938827196571
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_30.1: 774, 768; 0.2003624126326689, 0.19880921563551643
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_31.1: 759, 663; 0.30071315372424723, 0.2626782884310618
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:11:17,937 CLASSIFIED_HEADER_NAME_33.1: 993, 993; 0.721656976744186, 0.721656976744186
2023-04-06 10:11:17,937 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:11:17,937 batch took: 3.2566475868225098
--
2023-04-06 10:14:16,965 switch count: off 94; on 174
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_25.1: 13400, 13395; 0.4444149641814805, 0.4442491377023083
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_28.1: 3669, 3760; 0.4717757490034718, 0.48347691912048346
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_29.1: 1163, 1133; 0.22657315410091564, 0.22072861874147673
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_30.1: 695, 697; 0.1799119855034947, 0.18042971783587886
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_31.1: 625, 620; 0.24762282091917592, 0.24564183835182252
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_33.1: 986, 970; 0.7165697674418605, 0.7049418604651163
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:14:17,150 batch took: 3.037768602371216
--

(I had to restart readfish and boss-runs at the start, hence some lines might show up twice)

  1. The readfish chunklog does not contain any single_off instructions. If I run grep -P "single_off\tunblock\tselect_species" chunk.log I get no output.

  2. In total only 5000 reads are ejected, and all these reads are ejected due to reaching max chunks. These reads can also be seen in the MinKNOW gui. (Length of couple KB)

So indeed it seems like readfish does not quite pick up the correct BOSS-RUNS masks, and/or does not use them to eject reads for some reason.

Let me know what you think the next best steps for debugging might be!

Cheers, Ben

W-L commented 1 year ago

I think I spotted the issue. In your toml configuration file for readfish there's both targets = and mask = defined, and I'm afraid that our documentation was not quite clear enough to distinguish these. targets = retains the same functionality as in standard readfish runs, i.e. it can be used to define sequences/contigs from which to always accept reads. This takes precedence over the masks that BOSS-RUNS produces. If you set empty targets (targets = []), then all of the sequences in the reference file should be handled using the BOSS-RUNS masks instead, and you should hopefully see the expected rejections. I can see how that is not entirely intuitive, and will add a note to our docs. Please let me know if this helps, and thanks again, Lukas

bnoordijk commented 1 year ago

Hi Lukas,

Indeed that seems to have fixed it. Thanks for your swift responses and awesome help!

Cheers, Ben!

W-L commented 1 year ago

Great to hear, thanks for confirming! Good luck with your experiments, I'd be happy to learn more if you can/want to share.