Open RagnarGrootKoerkamp opened 2 years ago
With the first case, obviously it is not logical to set min
and max
length equal to each other. With your second case scenario, I suspect that the reference genome you are using is smaller than the read lengths you specified. May I ask whether you are using the pre-trained models or if you trained your own model?
With the first case, obviously it is not logical to set
min
andmax
length equal to each other.
Hmm OK, that wasn't obvious to me. I would like to generate some reads to test a pairwise aligner I'm working on, and to benchmark it, it is nice to have reads of a specific length. I changed it some some interval around it and it works now. Anyway, displaying a warning of just crashing would be nice ;)
With your second case scenario, I suspect that the reference genome you are using is smaller than the read lengths you specified.
Oh right, that may well be the case. I am using some human genome reference but I noticed my fasta file also has some shorter sequences in addition to the long chromosomes. Again, a warning message would be nice.
May I ask whether you are using the pre-trained models or if you trained your own model?
I'm using pre-trained models, since I don't have direct access to reads.
My full NanoSim invocation is this, where {..}
will be substituted by snakemake:
simulator.py genome \
--ref_g input/reference/human.fa \
--output input/simulated/human-x{wildcards.x}-n{wildcards.n} \
-dna_type linear \
--model_prefix ../../nanosim/pre-trained_models/human_NA12878_DNA_FAB49712_guppy/training \
--min_len {params.min} \
--median_len {wildcards.n} \
--max_len {params.max} \
--sd_len 1.05 \
--number {params.generate_x} \
--strandness 1 \
--seed 314151 \
--num_threads 6
I'm getting some index out of range errors, possibly because of setting the same value (or too close?) for
-min
and-max
:-min 10000 -max 10000
:and
-min 900000 -max 1100000
: