Closed bnoordijk closed 1 year ago
Hi Ben,
Thanks for your efforts and for the detailed logs, that's very helpful. From a first look it seems that it should be working fine. As you say, BOSS-RUNS creates new masks and readfish picks them up as well. At least at the very beginning of the sequencing run. Do you see any additional Reloaded mask dict for
.. in the readfish log? That would indicate that the masks from BOSS-RUNS very successfully replaced by updated ones. I'm asking since it seems that the masks that BOSS-RUNS produces instruct quite a few reads to be rejected. E.g. from what I can see, reads mapping to the reference CLASSIFIED_HEADER_NAME_25.1
make up about 47% of your sample, and at the time of the second log you posted as EDIT, ~55% of the sites are classed as "to be accepted" if a read starts there. So if this single sequence makes up ~50% of the sample and we reject ~50% of it, we should see at least ~25% of total reads getting rejected. In readfish's chunk.log
these should appear mostly as single_off
unblock instructions, as you probably expected. The part of the chunk.log
you posted seems to be the very top of the file from the start of the experiment. Did you check if any single_off unblock select_species
were recorded later on? You can grep for them with grep -P "single_off\tunblock\tselect_species" chunk.log
, since they would appear just like any other read unblocked by readfish. If using BOSS-RUNS, readfish has no other masks that it works off of, so there is no special tag or anything to distinguish which reads were rejected because of BOSS-RUNS.
Besides that, what is the distribution of read lengths in your library? Did you see a peak of reads being rejected? Or any difference between the reads from the control and adaptive sampling sectors on the flowcell?
Thanks!
Hi W-L,
Thanks for the quick reply, really appreciate it!
grep "dict" log.log
(Omitting everything past 10:15 to save some space).
2023-04-06 09:38:13,206 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:38:19,190 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:38:23,994 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:39:50,045 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:41:19,300 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:44:19,080 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:47:18,989 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:50:18,803 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:53:19,067 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:56:20,054 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 09:57:49,736 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:00:49,656 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:03:48,466 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:06:48,286 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:09:48,888 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:11:18,161 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
2023-04-06 10:14:17,283 ru.ru_gen_boss_runs Reloaded mask dict for dict_keys(['CLASSIFIED_HEADER_NAME_29.1', 'CLASSIFIED_HEADER_NAME_28.1', 'CLASSIFIED_HEADER_NAME_30.1', 'CLASSIFIED_HEADER_NAME_25.1', 'CLASSIFIED_HEADER_NAME_32.1', 'CLASSIFIED_HEADER_NAME_31.1', 'CLASSIFIED_HEADER_NAME_33.1', 'CLASSIFIED_HEADER_NAME_34.1'])
Also, we can see that BOSS-RUNS changes the acceptance rates of various contigs over time:
grep -A 9 "switch" select_species.bossruns.log
2023-04-06 09:32:34,699 initialising strategy switches
2023-04-06 09:32:34,883 initialising phi
2023-04-06 09:32:34,884 initializing priors
2023-04-06 09:32:34,884 initialising prior read length distribution
2023-04-06 09:32:34,887 initialising positional scores
2023-04-06 09:32:34,931 total score is: 226784.95549006073
2023-04-06 09:32:35,412 initialising fist strategy
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:35,412 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:32:45,318 initialising strategy switches
2023-04-06 09:32:45,503 initialising phi
2023-04-06 09:32:45,503 initializing priors
2023-04-06 09:32:45,503 initialising prior read length distribution
2023-04-06 09:32:45,506 initialising positional scores
2023-04-06 09:32:45,551 total score is: 226784.95549006073
2023-04-06 09:32:46,028 initialising fist strategy
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:46,028 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:32:48,875 switch count: off 266; on 2
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_25.1: 29952, 29952; 0.9933669408331123, 0.9933669408331123
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_30.1: 3863, 3863; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_31.1: 2524, 2524; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_32.1: 1442, 1442; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:32:49,062 CLASSIFIED_HEADER_NAME_34.1: 1106, 1106; 0.8468606431852986, 0.8468606431852986
2023-04-06 09:32:50,101 batch took: 4.023870229721069
--
2023-04-06 09:34:16,439 switch count: off 252; on 16
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_25.1: 28795, 28752; 0.9549946935526665, 0.9535685858317856
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_28.1: 7577, 7577; 0.9742831425999743, 0.9742831425999743
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_29.1: 4933, 4933; 0.9610364309370738, 0.9610364309370738
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_30.1: 3463, 3463; 0.8964535335231685, 0.8964535335231685
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_31.1: 2124, 2124; 0.8415213946117274, 0.8415213946117274
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_32.1: 1242, 1242; 0.8613037447988904, 0.8613037447988904
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:34:16,629 CLASSIFIED_HEADER_NAME_34.1: 906, 906; 0.6937212863705973, 0.6937212863705973
2023-04-06 09:34:16,629 batch took: 1.4450161457061768
--
2023-04-06 09:37:16,085 switch count: off 230; on 38
2023-04-06 09:37:16,274 CLASSIFIED_HEADER_NAME_25.1: 27215, 27215; 0.9025935261342531, 0.9025935261342531
2023-04-06 09:37:16,274 CLASSIFIED_HEADER_NAME_28.1: 7125, 7043; 0.9161630448759162, 0.9056191333419056
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_29.1: 3743, 3733; 0.7292031950126632, 0.7272550165595169
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_30.1: 3484, 3463; 0.9018897230132021, 0.8964535335231685
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_31.1: 1924, 1925; 0.7622820919175911, 0.7626782884310618
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_32.1: 1042, 1042; 0.7226074895977809, 0.7226074895977809
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:37:16,275 CLASSIFIED_HEADER_NAME_34.1: 409, 400; 0.31316998468606433, 0.30627871362940273
2023-04-06 09:37:16,275 batch took: 1.468853235244751
--
2023-04-06 09:38:18,395 initialising strategy switches
2023-04-06 09:38:18,592 initialising phi
2023-04-06 09:38:18,593 initializing priors
2023-04-06 09:38:18,593 initialising prior read length distribution
2023-04-06 09:38:18,596 initialising positional scores
2023-04-06 09:38:18,644 total score is: 226784.95549006073
2023-04-06 09:38:19,144 initialising fist strategy
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_25.1: 30152, 30152; 1.0, 1.0
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_28.1: 7777, 7777; 1.0, 1.0
2023-04-06 09:38:19,144 initial acceptance rates BR CLASSIFIED_HEADER_NAME_29.1: 5133, 5133; 1.0, 1.0
--
2023-04-06 09:38:23,690 switch count: off 230; on 38
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_25.1: 27219, 27219; 0.9027261873175909, 0.9027261873175909
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_28.1: 7134, 7041; 0.9173203034589174, 0.9053619647679053
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_29.1: 3747, 3733; 0.7299824663939217, 0.7272550165595169
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_30.1: 3484, 3463; 0.9018897230132021, 0.8964535335231685
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_31.1: 1926, 1924; 0.7630744849445324, 0.7622820919175911
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_32.1: 1042, 1042; 0.7226074895977809, 0.7226074895977809
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_33.1: 1376, 1376; 1.0, 1.0
2023-04-06 09:38:23,884 CLASSIFIED_HEADER_NAME_34.1: 411, 400; 0.31470137825421135, 0.30627871362940273
2023-04-06 09:38:25,211 batch took: 6.026308059692383
--
2023-04-06 09:39:49,548 switch count: off 210; on 58
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_25.1: 24836, 24808; 0.8236932873441231, 0.8227646590607588
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_28.1: 7149, 7035; 0.9192490677639192, 0.9045904590459046
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_29.1: 3507, 3416; 0.6832261835184102, 0.6654977595947789
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_30.1: 3286, 3242; 0.8506342221071705, 0.8392441107947192
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_31.1: 1524, 1524; 0.6038034865293186, 0.6038034865293186
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_32.1: 842, 842; 0.5839112343966713, 0.5839112343966713
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_33.1: 1209, 1176; 0.8786337209302325, 0.8546511627906976
2023-04-06 09:39:49,735 CLASSIFIED_HEADER_NAME_34.1: 400, 400; 0.30627871362940273, 0.30627871362940273
2023-04-06 09:39:49,736 batch took: 1.4432902336120605
--
2023-04-06 09:41:19,099 switch count: off 193; on 75
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_25.1: 23588, 23522; 0.7823029981427434, 0.7801140886176705
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_28.1: 6946, 6771; 0.8931464575028931, 0.8706442072778706
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_29.1: 3493, 3411; 0.6804987336840055, 0.6645236703682057
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_30.1: 2286, 2240; 0.5917680559150919, 0.5798602122702563
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_31.1: 1229, 1137; 0.4869255150554675, 0.45047543581616484
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_32.1: 748, 659; 0.5187239944521498, 0.45700416088765605
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_33.1: 1209, 1176; 0.8786337209302325, 0.8546511627906976
2023-04-06 09:41:19,288 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:41:19,289 batch took: 1.4642820358276367
--
2023-04-06 09:44:18,770 switch count: off 178; on 90
2023-04-06 09:44:18,959 CLASSIFIED_HEADER_NAME_25.1: 22197, 22215; 0.736170071637039, 0.7367670469620589
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_28.1: 6605, 6280; 0.8492992156358493, 0.8075093223608075
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_29.1: 3251, 3197; 0.6333528151178648, 0.6228326514708747
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_30.1: 1838, 1657; 0.47579601346104067, 0.4289412373802744
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_31.1: 1164, 1134; 0.4611727416798732, 0.44928684627575277
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_32.1: 414, 400; 0.2871012482662968, 0.27739251040221913
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_33.1: 1179, 1176; 0.8568313953488372, 0.8546511627906976
2023-04-06 09:44:18,960 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:44:18,961 batch took: 1.4967708587646484
--
2023-04-06 09:47:18,592 switch count: off 168; on 100
2023-04-06 09:47:18,780 CLASSIFIED_HEADER_NAME_25.1: 21299, 21090; 0.7063876359777129, 0.6994560891483153
2023-04-06 09:47:18,780 CLASSIFIED_HEADER_NAME_28.1: 6248, 5892; 0.8033946251768034, 0.7576186190047576
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_29.1: 2733, 2733; 0.532437171244886, 0.532437171244886
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_30.1: 1614, 1619; 0.417809992234015, 0.4191043230649754
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_32.1: 413, 400; 0.28640776699029125, 0.27739251040221913
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_33.1: 1177, 1176; 0.8553779069767442, 0.8546511627906976
2023-04-06 09:47:18,781 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:47:18,781 batch took: 1.643306016921997
--
2023-04-06 09:50:18,557 switch count: off 159; on 109
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_25.1: 19868, 19737; 0.6589280976386309, 0.6545834438843194
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_28.1: 5974, 5687; 0.7681625305387682, 0.7312588401697313
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_29.1: 2533, 2533; 0.49347360218195985, 0.49347360218195985
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_30.1: 1604, 1600; 0.4152213305720942, 0.4141858659073259
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:50:18,746 CLASSIFIED_HEADER_NAME_32.1: 416, 400; 0.2884882108183079, 0.27739251040221913
2023-04-06 09:50:18,747 CLASSIFIED_HEADER_NAME_33.1: 1176, 1176; 0.8546511627906976, 0.8546511627906976
2023-04-06 09:50:18,747 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:50:18,748 batch took: 1.7912256717681885
--
2023-04-06 09:53:18,525 switch count: off 144; on 124
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_25.1: 17850, 17869; 0.5920005306447333, 0.5926306712655877
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_28.1: 6031, 5520; 0.7754918348977755, 0.7097852642407098
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_29.1: 2353, 2201; 0.4584063900253263, 0.4287940775375024
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_30.1: 1337, 1294; 0.34610406419880924, 0.33497281905254983
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_33.1: 1197, 1176; 0.8699127906976745, 0.8546511627906976
2023-04-06 09:53:18,714 CLASSIFIED_HEADER_NAME_34.1: 200, 200; 0.15313935681470137, 0.15313935681470137
2023-04-06 09:53:18,715 batch took: 1.7903025150299072
--
2023-04-06 09:56:19,424 switch count: off 133; on 135
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_25.1: 16673, 16713; 0.5529649774475989, 0.5542915892809764
2023-04-06 09:56:19,614 CLASSIFIED_HEADER_NAME_28.1: 6078, 5477; 0.7815352963867815, 0.7042561398997043
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_29.1: 2167, 1924; 0.42217027079680497, 0.3748295343853497
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_30.1: 1354, 1319; 0.35050478902407456, 0.3414444732073518
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_33.1: 1202, 1176; 0.873546511627907, 0.8546511627906976
2023-04-06 09:56:19,615 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 09:56:19,615 batch took: 2.724835157394409
--
2023-04-06 09:57:49,152 switch count: off 126; on 142
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_25.1: 16690, 16553; 0.5535287874767842, 0.5489851419474662
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_28.1: 5807, 5300; 0.7466889546097467, 0.6814967211006815
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_29.1: 1763, 1733; 0.34346386128969414, 0.33761932593025523
2023-04-06 09:57:49,339 CLASSIFIED_HEADER_NAME_30.1: 1207, 1207; 0.312451462593839, 0.312451462593839
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_33.1: 1260, 1123; 0.9156976744186046, 0.8161337209302325
2023-04-06 09:57:49,340 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 09:57:49,341 batch took: 2.6383650302886963
--
2023-04-06 10:00:49,310 switch count: off 121; on 147
2023-04-06 10:00:49,495 CLASSIFIED_HEADER_NAME_25.1: 16350, 16249; 0.5422525868930751, 0.5389028920137967
2023-04-06 10:00:49,495 CLASSIFIED_HEADER_NAME_28.1: 5613, 5063; 0.7217436029317218, 0.6510222450816511
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_29.1: 1377, 1366; 0.2682641729982466, 0.2661211766997857
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_30.1: 1007, 1007; 0.26067822935542323, 0.26067822935542323
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_31.1: 1124, 1124; 0.44532488114104596, 0.44532488114104596
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_33.1: 1255, 1123; 0.9120639534883721, 0.8161337209302325
2023-04-06 10:00:49,496 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:00:49,496 batch took: 3.032994031906128
--
2023-04-06 10:03:48,264 switch count: off 114; on 154
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_25.1: 15728, 15581; 0.5216237728840541, 0.5167484743963916
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_28.1: 5547, 5069; 0.7132570399897132, 0.6517937508036518
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_29.1: 1379, 1368; 0.2686538086888759, 0.26651081239041496
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_30.1: 1017, 1016; 0.263266891017344, 0.26300802485115193
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_31.1: 1000, 1000; 0.39619651347068147, 0.39619651347068147
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_33.1: 1023, 1040; 0.7434593023255814, 0.7558139534883721
2023-04-06 10:03:48,449 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:03:50,339 batch took: 4.670608282089233
--
2023-04-06 10:06:48,012 switch count: off 107; on 161
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_25.1: 13507, 13408; 0.44796365083576545, 0.444680286548156
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_28.1: 5311, 4856; 0.682911148257683, 0.6244052976726244
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_29.1: 1386, 1368; 0.27001753360607833, 0.26651081239041496
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_30.1: 1007, 1006; 0.26067822935542323, 0.26041936318923115
2023-04-06 10:06:48,197 CLASSIFIED_HEADER_NAME_31.1: 811, 800; 0.3213153724247227, 0.31695721077654515
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_33.1: 1023, 1037; 0.7434593023255814, 0.7536337209302325
2023-04-06 10:06:48,198 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:06:48,198 batch took: 2.6857783794403076
--
2023-04-06 10:09:48,410 switch count: off 100; on 168
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_25.1: 12925, 12781; 0.42866144866012207, 0.4238856460599629
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_28.1: 5072, 4658; 0.6521795036646522, 0.598945608846599
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_29.1: 1165, 1168; 0.2269627897915449, 0.2275472433274888
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_30.1: 608, 608; 0.15739062904478385, 0.15739062904478385
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_31.1: 800, 800; 0.31695721077654515, 0.31695721077654515
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_33.1: 1023, 1032; 0.7434593023255814, 0.75
2023-04-06 10:09:48,593 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:09:48,594 batch took: 3.220763921737671
--
2023-04-06 10:11:17,751 switch count: off 98; on 170
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_25.1: 13460, 13411; 0.4464048819315468, 0.4447797824356593
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_28.1: 5494, 4989; 0.7064420727787064, 0.6415070078436415
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_29.1: 1193, 1159; 0.23241768946035457, 0.2257938827196571
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_30.1: 774, 768; 0.2003624126326689, 0.19880921563551643
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_31.1: 759, 663; 0.30071315372424723, 0.2626782884310618
2023-04-06 10:11:17,936 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:11:17,937 CLASSIFIED_HEADER_NAME_33.1: 993, 993; 0.721656976744186, 0.721656976744186
2023-04-06 10:11:17,937 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:11:17,937 batch took: 3.2566475868225098
--
2023-04-06 10:14:16,965 switch count: off 94; on 174
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_25.1: 13400, 13395; 0.4444149641814805, 0.4442491377023083
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_28.1: 3669, 3760; 0.4717757490034718, 0.48347691912048346
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_29.1: 1163, 1133; 0.22657315410091564, 0.22072861874147673
2023-04-06 10:14:17,149 CLASSIFIED_HEADER_NAME_30.1: 695, 697; 0.1799119855034947, 0.18042971783587886
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_31.1: 625, 620; 0.24762282091917592, 0.24564183835182252
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_32.1: 400, 400; 0.27739251040221913, 0.27739251040221913
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_33.1: 986, 970; 0.7165697674418605, 0.7049418604651163
2023-04-06 10:14:17,150 CLASSIFIED_HEADER_NAME_34.1: 0, 0; 0.0, 0.0
2023-04-06 10:14:17,150 batch took: 3.037768602371216
--
(I had to restart readfish and boss-runs at the start, hence some lines might show up twice)
The readfish chunklog does not contain any single_off instructions. If I run grep -P "single_off\tunblock\tselect_species" chunk.log
I get no output.
In total only 5000 reads are ejected, and all these reads are ejected due to reaching max chunks. These reads can also be seen in the MinKNOW gui. (Length of couple KB)
So indeed it seems like readfish does not quite pick up the correct BOSS-RUNS masks, and/or does not use them to eject reads for some reason.
Let me know what you think the next best steps for debugging might be!
Cheers, Ben
I think I spotted the issue. In your toml configuration file for readfish there's both targets =
and mask =
defined, and I'm afraid that our documentation was not quite clear enough to distinguish these. targets =
retains the same functionality as in standard readfish runs, i.e. it can be used to define sequences/contigs from which to always accept reads. This takes precedence over the masks that BOSS-RUNS produces.
If you set empty targets (targets = []
), then all of the sequences in the reference file should be handled using the BOSS-RUNS masks instead, and you should hopefully see the expected rejections. I can see how that is not entirely intuitive, and will add a note to our docs.
Please let me know if this helps, and thanks again,
Lukas
Hi Lukas,
Indeed that seems to have fixed it. Thanks for your swift responses and awesome help!
Cheers, Ben!
Great to hear, thanks for confirming! Good luck with your experiments, I'd be happy to learn more if you can/want to share.
Hi!
We've used Readfish in the past to eject unwanted reads; with great success! Now we want to start using BOSS-RUNS to prevent a coverage bias in our sample. We've configured it according to documentation, and ran it successfully on the simulated human read set.
However, when we run it on a real sequencing sample, we're not sure if BOSS-RUNS ever tells readfish to eject any reads. We compare it to one control zone, and sequence on a MinION flowcell.
Our toml looks like this:
We run readfish and bossruns with the following commands:
and
Each command is run from within a docker container, but the explicit command which mounts all the relevant volumes is not shown here.
It seems like boss-runs can update the masks dynamically; see this excerpt from our BOSS-RUNS log:
Readfish also successfully reloads the mask dict:
Excerpt from the readfish chunklog. Readfish only eject reads because they exceed max_chunks it seems.
How can we find out when Readfish ejects reads because of something to do with BOSS-RUNS?
The sample we sequence consists mainly of one species. We use readfish because we want more even coverage across our genome. But the results of our experiment do not indicate any difference in coverage. Do you have any ideas what could cause our issue?
We're happy to provide more information if you need any.
Kind regards, Ben
--EDIT-- Do you maybe think it has something to do with BOSS-RUNS detecting many dropouts? An example can be seen here: