question about CentroMiner

aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification

81 stars 6 forks source link

question about CentroMiner #7

Closed Henry-Ding closed 3 months ago

Henry-Ding commented 1 year ago

hi, A problem was found in the process of using CentroMiner. When I run this tool, I find that the result of splitting a single chromosome as input is the same as the result of the whole genome as input. Can I use the whole genome to find the most similar sequence or sequences as tandem repeats? looking forward to your reply best wishes henry

Echoring commented 1 year ago

We assumed that the centromere structure can be different even in one species, so the CentroMiner will process each chromosome seperately. If your species have a consistent centromere structure among all chromosomes, you may find that each candidate presents the same top monomer. If you surely want to collect only the most similar sequences as tandem repeat, you can join all chromosomes together and set an extremely high max gap parameter, this will report the whole genome as a candidate and show the richest tandem repeats. Then you can blast them against the genome or use other tools like stringdecomposer (https://github.com/ablab/stringdecomposer) to locate the centromere.

Henry-Ding commented 1 year ago

hi, Thank you for your prompt reply The centromere structure of chromosomes of the same species should be similar. Did CentroMiner take this similarity into account when drawing the result? When we identify centromere, should we compare the sequences of monomers found on different chromosomes to find the most similar ones? I combined all my chromosomes, but got an error, as follows Is it too big to get results? The fa file is larger than 2G looking forward to your reply best wishes henry

Echoring commented 1 year ago

In fact, the centromere structure can be largely different in the same species, in our mainly studied kiwifruits at least. The CentroMiner didn't take the similarity into consideration. We mainly judge this manually, for we haven't developed a satisfactory algorithm to achieve this automatically.

As for this error, I'm not sure what has happened. Have you checked that ./tmp/trfdat/ dir has any file? I think the program should try to find ./tmp/trfdat/{prefix}.{chr}.fasta............, seems the {prefix} is missing here. Or you can try runing trf as trf {your fasta} 2 7 7 80 10 50 200 -d -h and see what trf program say. (If success, put resulting file in said dir with said name and try again with same parameter, CentroMiner will directly read existing dat file)