LooseLab / Icarust

A fully featured MinKNOW simulator for testing read until experiments.
Mozilla Public License 2.0
17 stars 7 forks source link

Question regarding weight distribution #27

Closed Nirmal2310 closed 1 month ago

Nirmal2310 commented 1 month ago

Hey @Adoni5, I hope you are doing well. I have a basic question regarding weight distribution. Please forgive me if I understand it incorrectly.

As you mentioned in the Readme that it basically gives the likelihood of taking the read from the given target genome. The distribution.json file that you added to the repo looks like this: {"weights": [6264404, 5227293], "names": ["NC_002516.2", "NC_003997.3"]}

I am assuming that since the ratio is ~1.2, if I generate 1,000,000 bp, 454,546 bp will be from NC_003997.3 and 5454,54 bp from NC_002516.2. Please let me know, if I understand correctly.

Secondly, suppose I want to create a mock community like the zymobiomic gut community for which I know the concentration distribution across multiple species. How should I go about simulating this community through Icarust? One idea I have is to create a distribution.json file and add ratios of different genomes.

Can you tell me is it the right approach? If not, please help me how to go about it.

Adoni5 commented 1 month ago

Hi @Nirmal2310 - You have understood correctly! The ratio in this case is between Species, so 6264404 / (6264404 + 5227293) for NC_002516.2 to 5227293 / (6264404 + 5227293) for NC_003997.3.

If you are producing R9 data it would absolutely work to just alter the distributions.json weights, you could even just use 1,2,3,4,5 etc.

If you are producing R10 data, you could instead list this in the Simulation Profile Toml, where each bacteria is a sample, and the weight is given underneath each sample table.

Nirmal2310 commented 1 month ago

Hey @Adoni5, thank you so much for the reply. I will try this approach and get back to you if any problem occurs.