I am finding that maligner is returning no alignments when the maps are longer than circa 500 kb or so.
Here is an example using your provided example as a starting point.
I get all alignments back when the LENGTH parameter for fake map data is kept <= 450 kb.
When it gets up to ~500kb, only some of the fake maps align.
By 600kb or so, none of the fake maps align.
(Note fake maps are just generated from random intervals on the reference).
Ultimately, I'd like to compare multi-Mb contigs to multi-Mb maps -- for example.
###################################################################
# DEFINE VARIABLES CONTROLLING INPUT FILES, OUTPUT FILES,
# AND SETTINGS
# inputs
RMAPS_FASTA=rmaps.fasta
REF_FASTA=ecoli_k12_ref.fasta
# genome file for BEDtools (faSize from Kent Tools)
G=k12_ref_ecoli.genome
faSize -detailed $FASTA > $G
## obtain fake maps
N=10
LENGTH=500000 ## or 100-4000 kb
randomBed -g $G -l $LENGTH -n $N2 | slopBed -g $G -b 0 | fastaFromBed -fi $REF_FASTA -bed - | fasta_name_changer.py -f - --replace rmap -n > $RMAPS_FASTA
# outputs
RMAP_OUT_PFX=rmap.BamHI
REF_OUT_PFX=ref.BamHI
OUT_PFX=rmaps_to_asm
# setttings
REC_SEQ=GGATCC # recognition sequence for BamHI
MIN_FRAG_SIZE=1000 # bp units
QUERY_MISS_PENALTY=3.0
REF_MISS_PENALTY=3.0
QUERY_MAX_MISSES=5
REF_MAX_MISSES=5
SD_RATE=0.05
MIN_SD=750
MAX_SCORE_PER_INNER_CHUNK=1.0
MAX_ALIGNMENTS_PER_QUERY=5
###################################################################
# PREPARE MALIGNER_DP INPUTS
# convert the rmaps fasta file to the Maligner maps format and smooth the maps file by merging consecutive fragments that are less than 1kb
make_insilico_map -o $RMAP_OUT_PFX $RMAPS_FASTA $REC_SEQ
smooth_maps_file -m $MIN_FRAG_SIZE ${RMAP_OUT_PFX}.maps > ${RMAP_OUT_PFX}.smoothed.maps
# convert the asm fasta file to the Maligner maps format and smooth the maps file by merging consecutive fragments that are less than 1kb
make_insilico_map -o $REF_OUT_PFX $REF_FASTA $REC_SEQ
smooth_maps_file -m $MIN_FRAG_SIZE ${REF_OUT_PFX}.maps > ${REF_OUT_PFX}.smoothed.maps
###################################################################
# RUN MALIGNER_DP
# Align the smoothed query rmaps file to the smoothed contig maps file with maligner_dp.
maligner_dp \
-q $QUERY_MISS_PENALTY \
-r $REF_MISS_PENALTY \
--query-max-misses $QUERY_MAX_MISSES \
--ref-max-misses $REF_MAX_MISSES \
--max-score-per-inner-chunk $MAX_SCORE_PER_INNER_CHUNK \
--sd-rate $SD_RATE \
--min-sd $MIN_SD \
--reference-is-circular \
--no-query-rescaling \
--max-alignments $MAX_ALIGNMENTS_PER_QUERY \
${RMAP_OUT_PFX}.smoothed.maps \
${REF_OUT_PFX}.smoothed.maps \
2>&1 1> ${OUT_PFX}.aln | tee ${OUT_PFX}.log
Is there a hard-coded size limit for maps?
Or is this an effect of max misses, etc? I tried setting them higher to no avail though…
Nonetheless, it would be good to be able to set max misses as a rate rather than a limit so that the max misses are proportional to the query and ref lengths.
I hope you can offer any advice on how to overcome this if you have the time.
Hi,
I am finding that maligner is returning no alignments when the maps are longer than circa 500 kb or so.
Here is an example using your provided example as a starting point. I get all alignments back when the LENGTH parameter for fake map data is kept <= 450 kb. When it gets up to ~500kb, only some of the fake maps align. By 600kb or so, none of the fake maps align. (Note fake maps are just generated from random intervals on the reference).
Ultimately, I'd like to compare multi-Mb contigs to multi-Mb maps -- for example.
Is there a hard-coded size limit for maps? Or is this an effect of max misses, etc? I tried setting them higher to no avail though… Nonetheless, it would be good to be able to set max misses as a rate rather than a limit so that the max misses are proportional to the query and ref lengths.
I hope you can offer any advice on how to overcome this if you have the time.
Thanks in advance,
John Urban