If simulated read longer than selected chromosome, select from same species preferentially
Previous behaviour randomly selected a different sequence from the full pool of species
In the example of the Zymo mock model, this meant that S. aureus reads ended up under-represented, while Cryptococcus was over-represented
This was because the S. aureus reference contained 4 sequences - the main circular genome and 3 short plasmids
When a sequence from S. aureus was randomly selected, frequently the plasmids were chosen, which were shorter than the requested read length
Because there are more Cryptococcus sequences in the reference compared with the number of sequences in other references, randomly choosing a replacement sequence from the entire pool of species/sequences meant that Cryptococcus was chosen more frequently
To retain the requested abundances as much as possible, preferentially choose the 'alternative' sequence from the same species
This is consistent with the version used in the meta-NanoSim paper
As a fall-back, if there are no appropriate sequences in the species' reference, choose another species
If this is required, a warning will be printed, advising the user to check the abundances after simulation finishes