LooseLab / readfish

CLI tool for flexible and fast adaptive sampling on ONT sequencers
https://looselab.github.io/readfish/
GNU General Public License v3.0
169 stars 33 forks source link

Modify TOML of FLOW-MIN114 & SQK-LSK114 #250

Closed jennieli421 closed 1 year ago

jennieli421 commented 1 year ago

For our experiment, the flowcell type is FLOW-MIN114, and the kit is Ligation Sequencing Kit V14 (SQK-LSK114). However, I am not sure which toml file I should edit (to change break_reads_after_seconds=0.4). Below are the two that are relevant to MIN114: '/opt/ont/minknow/conf/package/sequencing/sequencing_MIN114_DNA_e8_2_400K.toml' '/opt/ont/minknow/conf/package/sequencing/sequencing_MIN114_DNA_e8_2_400K_long_read.toml'

Additionally, I'd like to confirm whether the rejected reads be included in fastq outputs. In the trial that used readfish, there is a file containing unblocked ids: '/var/lib/minknow/data/Sim/no_sample/20230626_1358_MN19362_sim_readfish_enrich_24e9b9a0/unblocked_read_ids.txt' If the rejected reads are included (which seem to be true), I will attempt to exclude them using the ids in the “unblocked_read_ids.txt”. Please let me know if my approach is correct.

mattloose commented 1 year ago

Hi,

For kit14 we currently have not tested modifying the break reads to 0.4 seconds. In fact we suggest you leave it at 1 second at this time (see #234 ) - however for the standard ligation kit the toml file you would edit is:

'/opt/ont/minknow/conf/package/sequencing/sequencing_MIN114_DNA_e8_2_400K.toml'

You obviously do this at your own risk.

The rejected reads will be written to the fastq outputs. Note that not all reads that appear in the unblocked_read_ids.txt file are guaranteed to be written to the fastq outputs as minknow may later determine them to not have been real reads.

I hope that helps.

jennieli421 commented 1 year ago

If I am using r10 flow cell, how should I change config_name in the TOML for readfish to identify the targets? in the previous thread @Adoni5 mentioned that the base caller model needs to be changed. Here is the default setting:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy/"
port = 5555

Additionally, is there any other files that need to be changed?

jennieli421 commented 1 year ago

I changed config_name to config_name = "dna_r10.4.1_e8.2_260bps_fast" and the result looks fine.

Adoni5 commented 1 year ago

hI @jennieli421 - That should work correctly! One note would be to double check the sequencing speed you are running at for R10 - 260bps or 400 bps. If it is 400 bps, the corresponding sequencing TOML would be "dna_r10.4.1_e8.2_400bps_fast"

jennieli421 commented 1 year ago

For "fast mode" basecalling, which does this mode correspond to?

Screenshot from 2023-08-07 10-24-26

Adoni5 commented 1 year ago

That is the type of the base calling model, not the speed of sequencing. The speed is to do with the kit that the library was made with. The fast here relates to the ..._fast for the basecalling model. The speed is the numbers at the numbers just before bps, i.e 400 bases per second or 200 bases per second

jennieli421 commented 1 year ago

Thanks for the explanation. I'm using the standard kit Ligation Sequencing Kit V14 (SQK-LSK114), not sure which speed it corresponds to. In the final review page, it says "basecalling: On (Fast basecalling, 400bps)".

Adoni5 commented 1 year ago

I would guess that's 400bps - @mattloose, any ideas?

mattloose commented 1 year ago

Yes - you should use 400bps - ONT have deprecated the slower mode now. One thing you need to check is which sample rate you are using - ONT have introduced 5khz now.

If you are on the most recent version of minKNOW you should use dna_r10.4.1_e8.2_400bps_5khz_fast.cfg