LooseLab / Icarust

A fully featured MinKNOW simulator for testing read until experiments.
Mozilla Public License 2.0
15 stars 5 forks source link

Error in Installation #1

Closed Nirmal2310 closed 1 year ago

Nirmal2310 commented 1 year ago

Dear @Adoni5, Thank you for making this awesome tool for simulating real-time nanopore sequencing. I was trying to install it on my system but I am keep getting this error:

icarust_error

As I am not familiar with Rust I request you to please help me out with this error.

Also, is it possible to simulate the sequencing run without applying adaptive sampling via ReadFish.

Thank you in advance for helping me out.

Adoni5 commented 1 year ago

HI @Nirmal2310 - I've seen this before as well! This isn't actually a problem with Rust, it's a problem with the version of protoc that is on your system. Older versions don't have a --experimental_allow_proto3_optional flag.

I'm assuming you are using a version of Ubuntu lower than 22, or a different Debian based OS, and installed protoc using apt-get install.

In order to sort this you have two choices:

  1. Install using our docker container for Icarust, - https://github.com/LooseLab/Icarust_docker, which has all the correct versions of the dependencies inside the container.
  2. The reason this version of protoc is too low is the version kept in the Ubuntu 20 repository is quite old now, as Long term support versions of Ubuntu tend to keep their packages pinned at older versions to ensure they don't cause any conflicts! It is however possible to manually install the latest protoc following the "Install pre-compiled binaries (any OS)" instructions.

Let me know if you need any more guidance with this!

Rory

P.S - yes it is possible to simulate a run without applying adaptive sampling via readfish. Although in that case it might be better to playback a run using a bulk fast5 file in MinKNOW. Depending on what you want to use it for, there is another tool that I would recommend for simulating data called Squigulator. https://github.com/hasindu2008/squigulator - this would simulate data much faster than the speed of a run.

Nirmal2310 commented 1 year ago

Dear @Adoni5, thank you so much for your response. I updated the protoc version to 23.2 and installated Icarust however I am again getting a warnning:

icarust_error

I thought it was something to do with icarust simulation toml file so I ran the following command:

~/Icarust/target/release/icarust -c ~/Icarust/config.ini -s ~/Icarust/Profile_tomls/config.toml -v

But I am getting the following error:

thread 'main' panicked at 'Failed to read kmers to string: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/impl_services/data.rs:706:53
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Can you please help me with this?

Nirmal2310 commented 1 year ago

I have tried changing the path for the squiggle arrays as well as the Flow call name but the error is the same. As we are simulating the Nanopore what will be the flowcell name? Do we have to connect the physical device with the flow cell in it ?

Adoni5 commented 1 year ago

So you were right about the first issue - you had swapped -c (the path to the config.ini file) with -s (The path to the config.toml file) - I'll admit that could be a distinguished a little more clearly!

The second command you have run -

~/Icarust/target/release/icarust -c ~/Icarust/config.ini -s ~/Icarust/Profile_tomls/config.toml -v

Is correct. The problem looks like the file that contains the model values for generating R10 data is not where the code expects it to be. I'm actually just updating this right now, so a much better version will be here in a couple of hours! However If you look at line 706 in src/impl_services/data.rs, the file the code is looking for is given there as static/9mer_levels_v1.txt, which is found in the root directory of Icarust. The problem I think you are having is namely that the code is looking for the file in \/static/9mer_levels_v1.txt which I'm guessing does not exist?

If you try running in the Icarust code containing directory (with /static in it), does it work?

Nirmal2310 commented 1 year ago

Dear @Adoni5, this is the directory structure from which I am running Icarust. Here as you can see static/ directory do exist with 9mer_levels_v1.txt in it.

image

This is the config.toml file I am using for running Icarust. Do you think there is some issue with this? Perhaps flow cell name etc.

image

And thank you for fixing the issue in advance.

Adoni5 commented 1 year ago

Hey @Nirmal2310 - This is my fault, if you add pore_type = "R9" on the line beneath working_pore_percent, that should fix it. Hopefully this will all be cleared up in the update!

Nirmal2310 commented 1 year ago

@Adoni5 I tried running Icarust using the changes suggested but I am still getting the following error:

image

The config file that I used is

output_path = "/tmp/"
target_yield = 100000000000
working_pore_percent = 85
pore_type = "R9"

[parameters]
sample_name = "test"
experiment_name = "test_2_bacteria"
flowcell_name = "FA07093"
experiment_duration_set = 4800
device_id = "Bantersaurus"
position = "FenceSitter"

[[sample]]
name = "Bacteria 1"
input_genome = "/home/user/softwares/Icarust/squiggle_arrs/NC_002516.2.squiggle.npy"
mean_read_length = 20000
weight = 1

[[sample]]
name = "Bacteria 2"
input_genome = "/home/user/softwares/Icarust/squiggle_arrs/NC_003997.3.squiggle.npy"
mean_read_length = 15000
weight = 2

Do you think I have to change some other parameters as well. If yes, please let me know

Adoni5 commented 1 year ago

Hi @Nirmal2310 - I've just pushed a fairly chunky update which I hope should clarify and fix up a few of the issues you've been facing!

Can you git pull and try this again! Thanks for your patience 👍🏼

Oh - I've overwritten the git history because I'm a savage like that apparently - so this may be of some help https://stackoverflow.com/questions/1125968/how-do-i-force-git-pull-to-overwrite-local-files

Nirmal2310 commented 1 year ago

Dear @Adoni5 Icarust is running perfectly after the update. Thank you so much for the help. As of now I am closing this issue will reach out to you if any problem occur.

Nirmal2310 commented 1 year ago

Dear @Adoni5, Sorry to bother you again. I was testing Icarust for human data. It is working perfectly as of now however when I tried readfish with it, it is throwing error. As mentioned in the documentation I gave position argument value of simulation.toml file to the readfish as --device but it didn't work. I tried it with the value of device_id as well but it again gave the same error. Error:

Traceback (most recent call last):
  File "/home/user/Downloads/miniconda/envs/readfish/bin/readfish", line 8, in <module>
    sys.exit(main())
  File "/home/user/Downloads/miniconda/envs/readfish/lib/python3.8/site-packages/ru/cli.py", line 43, in main
    args.func(parser, args)
  File "/home/user/Downloads/miniconda/envs/readfish/lib/python3.8/site-packages/ru/ru_gen.py", line 464, in run
    position = get_device(args.device, host=args.host, port=args.port)
  File "/home/user/Downloads/miniconda/envs/readfish/lib/python3.8/site-packages/ru/utils.py", line 927, in get_device
    raise ValueError("Could not find device {!r}".format(device))
ValueError: Could not find device 'FenceSitter'

This is the simulation profile I am using:

output_path = "/DATA2/Human_Icarust"
random_seed = 10
target_yield = 100000000000
working_pore_percent = 85
pore_type = "R9"

[parameters]
sample_name = "Human_STR"
experiment_name = "Hs_Normal"
flowcell_name = "FAQ1234"
experiment_duration_set = 7200
device_id = "Bantersaurus"
position = "FenceSitter"
break_read_ms = 400

[[sample]]
name = "Homo sapiens"
input_genome = "/DATA2/Human_data/"
mean_read_length = 15000
weight = 1

This is the command I am using for running readfish

readfish targets --device FenceSitter --experiment-name "Human Enrichment" --toml human_enrichment.toml --workers 8 --log-file human_icarust_enrichment.log --paf-log human_icarust_enrichment_paf.log --chunk-log human_icarust_enrichment_chunk.log --channels 1 2048 --run-time 7200

This is the config.ini file I am using for Icarust

[TLS]
cert-dir = /opt/ont/minknow/conf/rpc-certs/

[PORTS]
manager = 10000
position = 10001

[SEQUENCER]
channels = 2048

I request you to please help me with this. Thank you in advance.

Adoni5 commented 1 year ago

Hi @Nirmal2310 - can you try adding --port 10000 to the readfish command?

Nirmal2310 commented 1 year ago

Hi @Adoni5 , Thank you for your quick response. I tried doing that but it is giving the same error. For your reference I am also adding the readfish toml file:

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "127.0.0.1"
port = 5555

[conditions]
reference = "/DATA2/reference.mmi"

[conditions.0]
name = "Human Enrichment"
control = false
min_chunks = 1
max_chunks = 6
targets = ["chr21", "chr22"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

Let me know if any changes are required in the toml file as well

Adoni5 commented 1 year ago

Hi @Nirmal2310 -

Which version of readfish are you using? Is it from the dev_staging branch? I would have thought that setting --port 10000 and --device Bantersaurus should have worked for you here, as nothing obviously wrong leaps out at me.

I'm assuming this is all happening on the same PC?

Thanks Man, Rory

Nirmal2310 commented 1 year ago

I am using readfish 0.0.11a4 which is downloaded using dev_staging branch as mentioned in the repository. Also while installing I changed ont-pyguppy-client-lib==6.4.2 to ont-pyguppy-client-lib==6.5.7 to match it with the guppy version installed in the system. Yes, all of it is running in the same PC.

Adoni5 commented 1 year ago

Hi @Nirmal2310 - Can you try the following Icarust, readfish and readfish TOML command:

cargo run -r -- -s Profile_tomls/config.toml -v 

Readfish Command

readfish targets --device Bantersaurus --experiment-name test_connect --log-level info --toml /data/projects/rory_says_hi/icarust_paper_data/shared_files/human_chr_selection.toml --port 10000 --chunk-log control_icarust_R10_chunks.tsv

Needs reference path updating to the Icarust/python/mixed_ref.fa file on your system

[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "ipc:///tmp/.guppy"
port = 5555

[conditions]
reference = "/path/to/reference"

[conditions.0]
name = "select_one_bac"
control = false
min_chunks = 0
max_chunks = inf
targets = ["NC_003997.3",]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

Just tested this on my system.

I have one more question - do you know if your Guppy instance is listening on ipc or tcp? If you are running default Guppy GPU, you will be listening on IPC - there should be a file at /tmp/.guppy/5555. This might be worth checking when you try to run readfish!

Nirmal2310 commented 1 year ago

Dear @Adoni5, There is no such file as mixed_ref.fa inside the python directory so I just downloaded the fasta file from NCBI and concatenated them. As of now it is not throwing any error but it is also not proceeding. It has been stuck in this state:

readfish targets --device Bantersaurus --experiment-name test_connect --log-level info --toml test.toml --port 10000 --chunk-log control_icarust_R10_chunks.tsv
2023-06-13 17:58:33,951 ru.ru_gen /home/user/Downloads/miniconda/envs/readfish/bin/readfish targets --device Bantersaurus --experiment-name test_connect --log-level info --toml test.toml --port 10000 --chunk-log control_icarust_R10_chunks.tsv
2023-06-13 17:58:33,951 ru.ru_gen batch_size=512
2023-06-13 17:58:33,951 ru.ru_gen cache_size=512
2023-06-13 17:58:33,951 ru.ru_gen channels=[1, 512]
2023-06-13 17:58:33,951 ru.ru_gen chunk_log=control_icarust_R10_chunks.tsv
2023-06-13 17:58:33,951 ru.ru_gen command=targets
2023-06-13 17:58:33,951 ru.ru_gen device=Bantersaurus
2023-06-13 17:58:33,951 ru.ru_gen dry_run=False
2023-06-13 17:58:33,951 ru.ru_gen experiment_name=test_connect
2023-06-13 17:58:33,951 ru.ru_gen func=<function run at 0x7f4b505ac670>
2023-06-13 17:58:33,951 ru.ru_gen host=127.0.0.1
2023-06-13 17:58:33,951 ru.ru_gen log_file=None
2023-06-13 17:58:33,951 ru.ru_gen log_format=%(asctime)s %(name)s %(message)s
2023-06-13 17:58:33,951 ru.ru_gen log_level=info
2023-06-13 17:58:33,951 ru.ru_gen max_unblock_read_length_seconds=5
2023-06-13 17:58:33,951 ru.ru_gen paf_log=None
2023-06-13 17:58:33,951 ru.ru_gen port=10000
2023-06-13 17:58:33,951 ru.ru_gen run_time=172800
2023-06-13 17:58:33,951 ru.ru_gen throttle=0.4
2023-06-13 17:58:33,951 ru.ru_gen toml=test.toml
2023-06-13 17:58:33,951 ru.ru_gen unblock_duration=0.1
2023-06-13 17:58:33,951 ru.ru_gen workers=1
2023-06-13 17:58:33,954 ru.ru_gen Initialising minimap2 mapper
2023-06-13 17:58:34,036 ru.ru_gen Mapper initialised
2023-06-13 17:58:34,093 ru.ru_gen This experiment has 1 region on the flowcell
2023-06-13 17:58:34,094 ru.ru_gen Using reference: /home/user/softwares/Icarust/mixed_ref.mmi
2023-06-13 17:58:34,121 ru.ru_gen Region 'select_one_bac' (control=False) has 1 contig of which 1 are in the reference. There are 2 targets (including +/- strand) representing 45.49% of the reference. Reads will be unblocked when classed as single_off or multi_off; sequenced when classed as single_on or multi_on; and polled for more data when classed as no_map or no_seq.
Adoni5 commented 1 year ago

Most likely this is an issue for readfish connecting to guppy. The connection has a very long timeout. How are you running Guppy on your computer? Can you take a look at https://github.com/LooseLab/readfish/issues/221#issuecomment-1361661972 and https://github.com/LooseLab/readfish/issues/221#issuecomment-1375673490

Nirmal2310 commented 1 year ago

Okay I will try implementing the given solution and get back to you if this works.

Nirmal2310 commented 1 year ago

Hi @Adoni5, ReadFish is working perfectly with Icarust. Once again thank you so much for all the help.