bcgsc / NanoSim

Nanopore sequence read simulator
Other
233 stars 56 forks source link

Infinite loop in function extract_reads in metagenome mode when length equals max length #185

Open danpal96 opened 1 year ago

danpal96 commented 1 year ago

In the function extract_reads when dna_type == "metagenome" the while loops checks that:

length > max(seq_len[s].values())

and:

length < seq_len[s][key]

but there is no case for:

length == max(seq_len[s].values())

so the program gets stuck in an infinite loop

SaberHQ commented 1 year ago

Thanks for reporting this @danpal96

In the extract_reads function of simulatory.py script, the if clauses are inside a while True loop. If the condition is met, length < seq_len[s][key], then it will proceed, otherwise, if the length is bigger or equal, it will draw another key and checks the if clause until it finds the right key.

I guess you are correct in that case where the length is exactly equal to the max length and therefore, the code keeps generating random keys until it finds it (which may cause an infinite loop).

I am labelling this as a bug for now and my colleagues and I will take a look at it. @cheny19 @kmnip do you have any thoughts on this?

kmnip commented 1 year ago

Personally, I am not fond of the strategy of selecting random values within a while-true loop as it can definitely be a potential cause of an infinite loop.

For this part of the code, I think a better alternative is to extract the list of chromosomes that are longer than the read length and select a random chromosome from this list.