Closed alexomics closed 2 months ago
@bioteksampath
Hi Sam,
I’ve moved this to a new issue for now. Which playback file are you using?
Hi @alexomics
Thanks for your prompt response, i'm using the following file from the demo.
GXB02001_20230509_1250_FAW79338_X3_sequencing_run_NA12878_B1_19382aa5_ef4362cd.fast5
Okay, that’s the right file. Can you find the error message from the control server log? It should be at:
/var/log/minknow/MS00000/control_server_log_0.txt
i do have mutiple log files but this is the latest one
`base) sap223@gifs-c36qc14:/home/.gifs/sap223/Desktop/readfish_demo$ cat /var/log/minknow/MS00000/control_server_log-0.txt 2024-04-09 10:19:57.300601 INFO: starting_up (control) hostname: gifs-c36qc14.usask.ca system: ubuntu 20.04 Distribution: 23.11.7 (STABLE) MinKNOW Core: 5.8.6 Bream: 7.8.2 Protocol configuration: 5.8.6 Dorado (build): 0.0.0.19120+441f78764 Dorado (connected): 7.2.13+fba8e8925
2024-04-09 10:19:57.302232 WARNING: partially_raised_file_limits (control_server)
actual: 1024
target: 4096
2024-04-09 10:19:57.302321 INFO: auth_guest_mode (rpc)
value: local_only
2024-04-09 10:19:57.303249 INFO: external_offload_service_not_configured (script)
auth_token_file:
external_service_port: 0
2024-04-09 10:19:57.304133 INFO: flow_cell_position_instantiated (mgmt)
device_id: MS00000
hardware_type: MINION_USB
os_identifier:
simulated: true
2024-04-09 10:19:57.304176 INFO: active_device_set (mgmt)
identifier: 0x7f16a40023b0
2024-04-09 10:19:57.304204 INFO: successfully_read_flow_cell_data (mgmt)
attempt: 1
flow_cell_data:
observer: MantaControl
2024-04-09 10:22:03.492513 INFO: data_acquisition_starting (engine)
acquisition_run_id: 9824c93cc5f3296457bbe0e5608f6b17889f26e1
options: allow_file_output=true, enable_analysis=true, generate_reports=true, send_basecalling_metrics=true, send_sequencing_read_metrics=true, generate_final_summary=true
2024-04-09 10:22:03.497570 ERROR: failed_to_finish_transition (state_control)
destination: CreateSharedData
observer: MantaControl
operation_begin: 2024-04-09T16:22:03.010772
operation_end: 2024-04-09T16:23:03.009772
source: Idle
stage: transition
2024-04-09 10:22:03.597656 INFO: data_acquisition_finished (engine)
acquisition_run_id: 9824c93cc5f3296457bbe0e5608f6b17889f26e1
2024-04-09 10:22:03.597688 INFO: stop_processing_unblocks_called (mgmt)
2024-04-09 10:22:03.597713 WARNING: called_stop_processing_unblocks_while_not_processing (mgmt)
2024-04-09 10:22:33.599492 WARNING: forcibly_set_statistic_to_finished (pipeline_stats)
in_function:
So the error from that log is that MinKNOW cannot access the bulk file. I would try moving it to MinKNOW’s data folder at
/var/lib/minknow/data
make sure to update the simulation location and try again
Hi @alexomics Thanks for your response. yes, Minknow recognise input.fast5 from minknow/data folder but not from other locations. strange.
However, i have couple of other issues, hope you can provide some feedback. Since i have Dorado basecaller (not guppy) do i need to change anything in the .toml file, because i got below error.
readfish targets --toml Readfish_test_1600_N11.toml --device MS00000 --log-file Ap10test.log --experiment-name human_select_test
RuntimeError: The dna_r9.4.1_450bps_fast base-calling config listed in the readfish config TOML is not suitable for this flowcell and kit combination.
Please check the guppy_config value in the caller_settings.guppy section of your TOML file.
The following models are are given by ONT as suitable for this flow cell/kit combo:
dna_r10.4.1_e8.2_400bps_5khz_fast.cfg
dna_r10.4.1_e8.2_400bps_5khz_hac.cfg
dna_r10.4.1_e8.2_400bps_5khz_sup.cfg
dna_r10.4.1_e8.2_400bps_5khz_modbases_5hmc_5mc_cg_fast.cfg
dna_r10.4.1_e8.2_400bps_5khz_modbases_5hmc_5mc_cg_hac.cfg and dna_r10.4.1_e8.2_400bps_5khz_modbases_5hmc_5mc_cg_sup.cfg
`
log file. Ap10test.log
I can see from the control server log that you are using the R10 (5KHz) bulk file for playback. As such the dna_r9.4.1_450bps_fast
config cannot be used for base calling. I would recommend using the dna_r10.4.1_400bps_5khz_fast
config instead.
For the caller settings:
[caller_settings.guppy]
config = "dna_r10.4.1_e8.2_400bps_5khz_fast"
address = "ipc:///tmp/.guppy/5555"
debug_log = "live_reads.fq"
I can also see from the TOML file you uploaded that you've set the n_threads
option for the aligner. To make use of multi-threaded alignment you will need to use the mappy_rs
option and set the correct fn_idx_in
path:
[mapper_settings.mappy]
fn_idx_in = "/path/to/hg38.mmi"
debug_log = "live_alignments.paf"
n_threads = 24
Thanks @alexomics
I was able to run the demo successfully. I have some additional questions regarding my own research, and I'm hoping you could provide some feedback:
Apologies for the barrage of questions.
HI @bioteksampath ,
The above questions aren't issues with readfish and also are not linked to the original issue (which is now resolved).
It is helpful to keep these things separate. I shall close this issue and reference your comment in a new discussion - #346
Thanks, @mattloose and @alexomics,
I wanted to share the steps I took to address the issues we encountered with the Nanopore adaptive sequencing setup, I believe it will be useful for others.
Minknow File Access: I found that Minknow was unable to access the demo fast5 (R10 data) file from a directory other than /var/lib/minknow/data
. To resolve this, I moved the file to this folder. Please note that writing fast5 files to /var/lib/minknow/data
requires sudo permission.
Write Permission: To allow Minknow to write to /tmp/.guppy/5555
, I used the command chmod 777 /tmp/.guppy/5555
. However, obtaining write permission may require a request to the sudo user.
Configuration Update: I made changes to the .toml
configuration information. It is compatible with Dorado caller. Here are the updated settings:
For Guppy/Dorado:
[caller_settings.guppy]
config = "dna_r10.4.1_e8.2_400bps_5khz_fast"
address = "ipc:///tmp/.guppy/5555"
debug_log = "live_reads.fq"
For Multi-Threads:
[mapper_settings.mappy_rs]
fn_idx_in = "/path/to/hg38.mmi"
debug_log = "live_alignments.paf"
n_threads = 24
For Single Thread:
[mapper_settings.mappy]
fn_idx_in = "/path/to/hg38.mmi"
debug_log = "live_alignments.paf"
n_threads = 4
Additional Information: If using a targets bed file, ensure it is 6 cloumn bed 9tab separated) or 4 column comma-separated. Here's an example format:
targets = ["/path/to/targets.csv"]
Chr1
Scf777,510021,510052,+
Scf4546,108529,108529,+
Chr11,1,58419410,+
Chr12,1,5639592,+
OR .bed in 6 column
targets = ["/path/to/targets.bed"]
777 510021 510052 . . +
4546 108529 109529 . . +
These adjustments should help resolve the issues we encountered.
Hope this helps someone! Sam
Could you check your bed file format please?
The format you provide above is not a bed file format - the bed file format spcs are available here https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf and are explicitly not comma separated - rather they are tab or whitespace separated, which readfish does support.
Could you share the file that you used which did not work?
@mattloose thanks for pointing out.|
Strange is i can not validate the .bed format, but target csv
(rf_con) cxo314@gifs-c36qc14:/data/reference/canola$ head targets_315Svs.bed
777 510021 520052 +
4546 108529 109529 +
N11 1 58419410 +
N12 1 5639592 +
N13 1 5639592 +
N19 1 5639592 +
N2 68644 69237 +
(rf_con) cxo314@gifs-c36qc14:/data/reference/canola$ readfish validate canola_test_1600_N11_barcode1-4_bed.toml
2024-04-15 12:55:49,862 readfish /home/cxo314/gifs/.conda/envs/rf_con/bin/readfish validate canola_test_1600_N11_barcode1-4_bed.toml
2024-04-15 12:55:49,862 readfish command='validate'
2024-04-15 12:55:49,862 readfish log_file=None
2024-04-15 12:55:49,862 readfish log_format='%(asctime)s %(name)s %(message)s'
2024-04-15 12:55:49,862 readfish log_level='info'
2024-04-15 12:55:49,862 readfish no_check_plugins=False
2024-04-15 12:55:49,862 readfish no_describe=False
2024-04-15 12:55:49,862 readfish prom=False
2024-04-15 12:55:49,862 readfish toml='canola_test_1600_N11_barcode1-4_bed.toml'
2024-04-15 12:55:49,862 readfish.validate .
.
.
.
2024-04-15 12:55:49,878 readfish.validate Loaded TOML config without error
2024-04-15 12:55:49,878 readfish.validate Initialising Caller
2024-04-15 12:55:49,919 readfish.validate Caller initialised
2024-04-15 12:55:49,920 readfish.validate Initialising Aligner
2024-04-15 12:55:51,695 readfish.validate Aligner initialised
2024-04-15 12:55:51,700 readfish.validate Configuration description:
Number of barcodes in the Conf (excluding unclassified and classified): 4
Barcode unclassified_reads (control=False), Barcode classified_reads (control=False), Barcode NAM00_300csv (control=False), Barcode NAM01_300csv (control=False), Barcode NAM04_300csv (control=False) and Barcode NAM05_300csv (control=False)
Region canola_test (control=False).
Region applies to section of flow cell (# = applied, . = not applied):
################################
NOTE - The following 315 contigs are listed as targets but have not been found on the target reference:
4546 108529 108529 +, 777 510021 510052 +, N11 1 58419410 +, N12 1 5639592 +, N13 1 5639592 +, N19 1 5639592 +, N2 10057805 10064396 +, N2 100
But my target in csv got validated
`(rf_con) cxo314@gifs-c36qc14:/data/reference/canola$ head adaONT_300SVcsv.txt
777,510021,510052,+
4546,108529,108529,+
N11,1,58419410,+
N12,1,5639592,+
N13,1,5639592,+
N19,1,5639592,+
N2,68644,69237,+
N2,190124,190157,+
N2,211196,213168,+
N2,310498,310546,+
(rf_con) cxo314@gifs-c36qc14:/data/reference/canola$ readfish validate canola_test_1600_N11_barcode1-4_csv.toml
2024-04-15 13:10:53,713 readfish /home/cxo314/gifs/.conda/envs/rf_con/bin/readfish validate canola_test_1600_N11_barcode1-4_csv.toml
2024-04-15 13:10:53,714 readfish command='validate'
2024-04-15 13:10:53,714 readfish log_file=None
2024-04-15 13:10:53,714 readfish log_format='%(asctime)s %(name)s %(message)s'
2024-04-15 13:10:53,714 readfish log_level='info'
2024-04-15 13:10:53,714 readfish no_check_plugins=False
2024-04-15 13:10:53,714 readfish no_describe=False
2024-04-15 13:10:53,714 readfish prom=False
2024-04-15 13:10:53,714 readfish toml='canola_test_1600_N11_barcode1-4_csv.toml'
2024-04-15 13:10:53,714 readfish.validate eJztWE1v3DYQvetXEPIlRtfaXccGEgM5OCkSBEjsIHF6MVyBK1Er1hKpiJQ/.
.
.
.
2024-04-15 13:10:53,727 readfish.validate Loaded TOML config without error
2024-04-15 13:10:53,727 readfish.validate Initialising Caller
2024-04-15 13:10:53,770 readfish.validate Caller initialised
2024-04-15 13:10:53,770 readfish.validate Initialising Aligner
2024-04-15 13:10:55,572 readfish.validate Aligner initialised
2024-04-15 13:10:55,573 readfish.validate Configuration description:
Number of barcodes in the Conf (excluding unclassified and classified): 4
Barcode unclassified_reads (control=False), Barcode classified_reads (control=False), Barcode NAM00_300csv (control=False), Barcode NAM01_300csv (control=False), Barcode NAM04_300csv (control=False) and Barcode NAM05_300csv (control=False)
Region canola_test (control=False).
Region applies to section of flow cell (# = applied, . = not applied):
2024-04-15 13:10:57,011 readfish.validate Using the mappy_rs plugin. Using reference: /data/reference/canola/N99_hifi.mmi.
Region canola_test has targets on 8 contigs, with 8 found in the provided reference.
This region has 314 total targets (+ve and -ve strands), covering approximately 3.76% of the genome.
Barcode unclassified_reads has targets on 0 contigs, with 0 found in the provided reference.
This barcode has 0 total targets (+ve and -ve strands), covering approximately 0.00% of the genome.
Barcode classified_reads has targets on 0 contigs, with 0 found in the provided reference.
This barcode has 0 total targets (+ve and -ve strands), covering approximately 0.00% of the genome.
Barcode NAM00_300csv has targets on 8 contigs, with 8 found in the provided reference.
This barcode has 314 total targets (+ve and -ve strands), covering approximately 3.76% of the genome.
Barcode NAM01_300csv has targets on 8 contigs, with 8 found in the provided reference.
This barcode has 314 total targets (+ve and -ve strands), covering approximately 3.76% of the genome.
Barcode NAM04_300csv has targets on 8 contigs, with 8 found in the provided reference.
This barcode has 314 total targets (+ve and -ve strands), covering approximately 3.76% of the genome.
Barcode NAM05_300csv has targets on 8 contigs, with 8 found in the provided reference.
This barcode has 314 total targets (+ve and -ve strands), covering approximately 3.76% of the genome.
`
Hi,
The issue is that your BED file is not the correct format. If you look in the docs here - https://looselab.github.io/readfish/toml.html#bed-or-csv-targets
we specify that for a BED file it needs to be in the 6 column format:
chrom chromStart chromEnd name score strand
name and score can be left as a "." if no information is being provided. However, the four column format you provided above does not match the 6 column format.
You would need:
777 510021 520052 . . +
For the CSV format we do take the four columns.
Hope that helps.
Thanks @mattloose the 6 cloumn .bed worked. (seems files name should end with .bed, if bed.txt not works)
Hi @alexomics,
I have same RUN ERROR issue on my demo run with in minutes of loading MinION data using playback option in MinKnow UI. I have a MinKnow with Dorado and running in a UBUNTU 22
Not sure where should I start? I currently struck at this step - https://github.com/LooseLab/readfish?tab=readme-ov-file#configuring-bulk-fast5-file-playback
Wondering, what are permission error I should be checking. I do have full permission on 5555
Thanks for your help, sam
Originally posted by @bioteksampath in https://github.com/LooseLab/readfish/issues/337#issuecomment-2045925085