INTABiotechMJ / MITE-Tracker

MITE Tracker: An accurate approach to identify miniature inverted-repeat transposable elements in large genomes
18 stars 9 forks source link

can not find MITE in my test genome? #5

Closed YuntaoTan closed 5 years ago

YuntaoTan commented 5 years ago

hi, I try to use MITE-Tracker,it run faster than MITE-Hunter. I split my genome which contain 738 Contigs ~300Mbp into 38 cuts file, I can find the candidate MITE in each cut, but can not find the final result. like following:

-rw-r--r-- 1 tanyt BGenome      2453 May  6 16:33 036/families.fasta
-rw-r--r-- 1 tanyt BGenome       822 May  6 16:33 036/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/all.fasta
-rw-r--r-- 1 tanyt BGenome   2624345 May  6 16:19 037/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/all.fasta
-rw-r--r-- 1 tanyt BGenome    904702 May  6 16:55 038/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7 11:06 atest/all.fasta
-rw-r--r-- 1 tanyt BGenome 170023347 May  7 11:06 atest/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome      2453 May  6 16:33 036/families.fasta
-rw-r--r-- 1 tanyt BGenome       822 May  6 16:33 036/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/all.fasta
-rw-r--r-- 1 tanyt BGenome   2624345 May  6 16:19 037/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:23 037/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/all.fasta
-rw-r--r-- 1 tanyt BGenome    904702 May  6 16:55 038/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  6 16:56 038/families_nr.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7 11:06 atest/all.fasta
-rw-r--r-- 1 tanyt BGenome 170023347 May  7 11:06 atest/candidates.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families_nr.fasta

there is nothing in

-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families.fasta
-rw-r--r-- 1 tanyt BGenome         0 May  7  2019 atest/families_nr.fasta

can you help me? is that the key of the parameter --min_copy_number ? I set it to 4, like you. is that means tetraploid in your test wheat. my species is diploid , should i change it to 2 ?

juancresc commented 5 years ago

Hello, can you send me the out.log file? the min_copy_number parameter is the minimun number of elements a family should contain to remain valid. If it is a small genome I'd suggest 2 or 3.

YuntaoTan commented 5 years ago

hi, @juancrescente , thanks for your reply, following is my out.log.

2019-05-06 17:25:27,801 Clustering
2019-05-06 17:25:27,801 /export/personal1/tanyt/Pipeline/Repeat/pipeline3/MITE/MITE-Tracker/vsearch-2.7.1/bin/vsearch --cluster_fast results/atest/candidates.fasta --threads 51 --strand both --clusters results/atest/temp/clust --iddef 1 --id 0.88
2019-05-06 18:46:02,392 Clustering done
2019-05-06 18:46:02,393 Filtering clusters
2019-05-06 18:46:15,366 Initial clusters: 240464
2019-05-06 21:44:43,660 Clusters: 0
2019-05-06 21:44:44,340 15609.018525 secs
2019-05-07 09:33:06,034 Clustering
2019-05-07 09:33:06,035 /export/personal1/tanyt/Pipeline/Repeat/pipeline3/MITE/MITE-Tracker/vsearch-2.7.1/bin/vsearch --cluster_fast results/atest/candidates.fasta --threads 51 --strand both --clusters results/atest/temp/clust --iddef 1 --id 0.88
2019-05-07 15:27:15,842 Clustering done
2019-05-07 15:27:15,843 Filtering clusters
2019-05-07 15:27:29,181 Initial clusters: 240464
2019-05-07 21:10:57,389 Clusters: 0
2019-05-07 21:10:59,967 41924.152439 secs

there is 0 Cluster, 2019-05-07 21:10:57,389 Clusters: 0. I also try to set --min_copy_number to 2, there is nothing found, my species is plant, I use MITE-Hunter, that can find many MITEs. Another issue maybe, the VSEARCH will write so many small file, in my case, initial clusters is 240464, so 240464 files i got, it will make the IO very busy and slow. should you consider other cluster method tools like cd-hit.

juancresc commented 5 years ago

we do not use cd-hit because of execution time. Seems like the clusters are not similar to each other. Maybe you can send me your sequences and I can take a look? write me to juan.crescente at gmail.com if you want