Closed mptrsen closed 12 months ago
Hi Malte,
I have similar issues with JupiterPlot, yet on a slightly different scale. I want to compare two highly conserved, closely related species, and 18 chromosomes match perfectly just the smallest one shows no links at all. I also tried mummer after seeing your plot, and it fits perfectly. I run JupiterPlot again, just comparing the one chromosome of the two species, and it also shows perfect synteny. So it would be really nice to know why JupiterPlot sometimes does not link clearly homologous regions.
I think the key would be to look at the minimap2 alignments. To fix this issue I think I would need to see the resulting bam file created in the pipeline. There could also be an element of filtering involved: https://github.com/JustinChu/JupiterPlot/blob/45676f3ee3b2f2134941cf6276a51e3bdaa07e0b/jupiter#L80 If the genome is highly repetitive it is possible that I have been filtering out links. If it is filtering related it seems I should probably include an option to allow repetitive links or change the default behaviour to not filter on mapping quality at all. I don't recall if there was a clear reason why I chose to filter on mapping quality in retrospect.
@mptrsen Can you try the latest commit with the MAPQ parameter set to 0?
Hi @JustinChu, for my dataset I have checked the bed file and all the alignments I am missing in the plot are there but are missing in the links, karyotype, and SeqOrder files. Filtering by mapping Q seems not to be the issue. I could also send you my two assemblies if that helps. The genomes are standard mammal genomes and very conserved between the two species so I would assume this is not an issue of repeats.
@mptrsen Can you try the latest commit with the MAPQ parameter set to 0?
Same result, sorry. I'll upload the alignment SAM next week.
@sven1990 Sending me the assemblies would be very helpful. Also, let me know what syntenic region you expect to see on the plot that is only visible when you subset the chromosome.
Hi @JustinChu,
thanks for looking into it.
You can download the assemblies from my dropbox [(https://www.dropbox.com/sh/rvp92mdi9pcsw4h/AAAi3HpFwvykg7HwwUp4pP2da?dl=0)]
I expect to see syntenic regions between the largest 19 scaffolds (chromosomes) of both assemblies. However, the plot so far does not show synteny between scaffold_19 of VMU_Ajub_asm_1.0 and scaffold NC_057357.1 from GCF_016509475.1.
As I mentioned before, there are alignments in the bed file, but in the end, they are missing in the plot.
Thanks again!
Sven
I have updated the uploaded data with the recent SAM. Hope that helps.
(samtools flagstats reports a 100% mapping rate)
I ran Jupiter with:
jupiter name=Micpus ref=reference.fa fa=scaffolds.fa MAPQ=0
Hi @JustinChu,
I know you are likely busy with other tasks but did you maybe have the chance to look into my two assemblies?
I very much appreciate your help.
@sven1990 I'll try to have a fix or explanation by Friday.
@mptrsen I think there might be a conflict of temporary files causing issues in your output. I get different results when I run the data with the same parameters minBundleSize=1
and maxGap=9999999999
:
Having such a high max gap probably isn't very accurate to true synteny interpretations so here are results with minBundleSize=1
and maxGap=100000
(default):
I also ran it off your sam file and got the same figure so I suspect it is an issue with the *links*
files.
I'm not sure if this completely lines up with your MUMMER results so let me know if you see something strange.
The file conflict is something I think I can resolve the issue by making relevant parameters part of the temporary filenames so this happens less, though at the moment make sure you are removing and *links*
files after changing parameters relevant to links.
@sven1990 The issue in your case was the ng
parameter was filtering the scaffolds that you were interested in. I honestly should have recognized it sooner. Sorry, I took so long. Here are the results with ng=120
:
Perhaps I should change the default as assemblies are becoming far more contiguous these days. Maybe a more useful filtering parameter rather than ng
would be a minimum size like the parameter I use for the reference genome.
@JustinChu thank you a lot, really appreciate your help. I did not know you could set ng >100.
There seems to be an incongruence between JupiterPlot and a "regular" dot plot from a MUMMER (nucmer) alignment of two genomes of a chlorophyte algae species (21 Mbp). JupiterPlot shows no connections, whereas the dot plot shows large regions of similarity pretty much throughout the entire genome.
Please find the genomes and results here: https://uni-bonn.sciebo.de/s/LrubQxY319Lf8nn
I ran both JupiterPlot and MUMMER with default parameters:
To visualise the MUMMER alignment, I used Dot (https://dot.sandbox.bio/). It requires input reformatted by the
DotPrep.py
script included in the mummer_dotplot directory. It was run as:This produces the files Micpus.coords and Micpus.coords.idx that are also in the uploaded results, and can be used in the Dot viewer to reproduce the dot plot above.
I tried setting
minBundleSize=1
andmaxGap=9999999999
with the effect that it renders three small connections:Do I need to set other parameters? Or how can I make JupiterPlot's output show the obvious similarity between the two genomes that is visible in the dot plot?