Open darrenabbey opened 7 years ago
Commenting out or removing lines [19-20; 114-127; 210-225; 247-261; 285-300] from script "Ymap_root/scripts_genomes/genome.install_6.sh" should affect this change.
@darrenabbey , do you think it can be useful? For example, Phytophthora infestans is quite repetitive, and I wonder if we can do something with those regions (maybe ignore them in analysis?).
There is the potential for it to be useful. In some genomes repetitiveness analysis can help reveal where centromeres are, for example.
In the C. albicans genome, repetitiveness correlates strongly with GC bias. Not an exact correlation, but pretty strong.
That said, the analysis isn't used in the construction of any figure types at this time. It might be worthwhile to discuss adding such a figure type.
The reason I included repetitiveness analysis was that it appeared to correspond to a prominent CNV noise signal in a lot of C. albicans datasets. That noise was better matched by GC bias, which has a better rational behind it as well. Thus, the option of GC-correction was setup, but not repetitiveness-correction.
The whole chromosome repetitiveness plots I found most useful included a simple trace of the analysis, smoothed to limit the visual noise. While viewing much smaller regions, lesser smoothing was needed. For consistency with everything else, chromosome cartoon outlines would be used.
I remember plotting the smoothed trace such that the median height value was placed near the bottom quarter of the figure, with the y-range being sufficient to capture the max heights in the trace.
Providing units to the y-axis would be problematic. Perhaps it could be described as a "repetitiveness index" to avoid needing a specific unit.
I think it was in Cryptococcus where I localized the centromeres by examining repetitiveness traces because I was having a hard time finding coordinates for them in the databases.
I do have a much faster version of code for doing the repetitiveness processing, using a bit of a different algorithmic approach. I've been working on this offline with respect to Ymap. The final output files are bit-identical to what is in Ymap right now, so it shouldn't take too long to integrate it.
Genome repetitiveness calculation during genome install is unnecessary. The calculation was for investigational purposes early on in the process of writing YMAP. It was subsequently found to not be a useful metric for analysis.
Removing this calculation will save a significant time on installation of a new genome.