Closed Henry-Ding closed 3 months ago
We assumed that the centromere structure can be different even in one species, so the CentroMiner will process each chromosome seperately. If your species have a consistent centromere structure among all chromosomes, you may find that each candidate presents the same top monomer.
If you surely want to collect only the most similar sequences as tandem repeat, you can join all chromosomes together and set an extremely high max gap
parameter, this will report the whole genome as a candidate and show the richest tandem repeats. Then you can blast them against the genome or use other tools like stringdecomposer
(https://github.com/ablab/stringdecomposer) to locate the centromere.
hi,
Thank you for your prompt reply
The centromere structure of chromosomes of the same species should be similar. Did CentroMiner take this similarity into account when drawing the result? When we identify centromere, should we compare the sequences of monomers found on different chromosomes to find the most similar ones?
I combined all my chromosomes, but got an error, as follows
Is it too big to get results? The fa file is larger than 2G
looking forward to your reply
best wishes
henry
In fact, the centromere structure can be largely different in the same species, in our mainly studied kiwifruits at least. The CentroMiner didn't take the similarity into consideration. We mainly judge this manually, for we haven't developed a satisfactory algorithm to achieve this automatically.
As for this error, I'm not sure what has happened. Have you checked that ./tmp/trfdat/
dir has any file? I think the program should try to find ./tmp/trfdat/{prefix}.{chr}.fasta............
, seems the {prefix} is missing here. Or you can try runing trf as trf {your fasta} 2 7 7 80 10 50 200 -d -h
and see what trf program say. (If success, put resulting file in said dir with said name and try again with same parameter, CentroMiner will directly read existing dat file)
hi, A problem was found in the process of using CentroMiner. When I run this tool, I find that the result of splitting a single chromosome as input is the same as the result of the whole genome as input. Can I use the whole genome to find the most similar sequence or sequences as tandem repeats? looking forward to your reply best wishes henry