problems interpreting output

LisaBlazek commented 4 years ago

hallo! i very much like your program, it was easy to install and runs very fast! one issue i have, is that i'm not sure how to interpret my findings and i have some additional questions:

i used the program on the repeatmodeler output, were i have known and unknown repeats. some of the known repeats got assigned the same class by DeepTE, but some not, what i found very strange. for example i have an LTR/Gypsy element and DeepTE tells me it is an ClassII_DNA_TcMar_MITE, but it would be way to long for it to be a MITE. does that mean that the output from DeepTE suggested that the LTR/Gypsy element was falsely assigned?
same thing with LTR/Gypsy assigned as ClassII_DNA_Mutator_nMITE, I would like to know what the difference between nMITE and MITE is? stands the n for nested?
you have stated that the program can classify seven orders, and 11-24 superfamilies, is there some sort of list or could you kindly provide some?
im not so sure when to use -fam is it advised to use input where i have a mix of known and unknown sequences? or should i provide them separate? if i use known sequences would this be then a check if the sequences were assigned correctly?

im looking forward to further work with DeepTE! kind regards, lisa

yanhaidong1 commented 4 years ago

Hi lisa,

Thanks for using this tool. I appreciate you gave us some feedbacks.

Because the DeepTE has a tree structure prediction which means it first can distinguish your input unknown TEs to ClassI or ClassII or ClassIII, and next they will continue to classify the them to the subfamilies for each of these three top Class orders. I am thinking in your case the LTR/Gypsy was classified to the ClassII in the first step, probably because it lacks some important domain the LTR/Gypsy should have. There are two ways you can check. First, you can use another function of the DeepTE (DeepTE_domain.py) to check whether the LTR/Gypsy has its key domain. Secondly, you can check the probability of Class or families the LTR/Gypsy should belong to according to the prediction (The deepTE can give you this information: example data/working_dir/store_temp_opt_dir/*_probability_results.txt).

Also, no tool could have perfect prediction, so we cannot say the LTR/Gypsy element was falsely assigned by other tools. The DeepTE may give you some wrong prediction if two TEs are similar but actually they belong to different families. One good thing of DeepTE as I mentioned before, you can check the domains and probability this TE should belong to, which may give you some ideas like this LTR/Gypsy may lose some important sequence during the evolution that may cause it to be like other families.

The major difference is the MITEs do not encode any transposase compared to nMITE and usually the MITEs have less than 500 bp. Some reference may give you more ideas about them (https://www.nature.com/articles/nrg2165; https://academic.oup.com/nar/article/38/22/e199/1048972)
The list of families are all referred from this study https://www.nature.com/articles/nrg2165, which gives you most the families the TEs have.
When you have unknown TEs, you do not need to set ‘-fam’. When you know the unknown TEs belong to the ClassI but cannot make sure it is from LTR or nLTR or LINE or SINE, you can use the ‘-fam ClassI’ to classify the unknown TE into these four families. You can provide them together or separately; it does not matter for the prediction. Also, you can check if the sequences were assigned correctly or not by using the known TEs as input sequences. Keep in mind, you can use the ways in the point 1 of this email to have a further checking.

If you have any question, please let us know. Thanks!

Best wishes Haidong

LisaBlazek commented 4 years ago

hello Haidong! thank you very much for you fast and detailed answer! things are much clearer now! if i have another question, i will write again, but for now im good! again thank you very much! kind regards, lisa

LiLabAtVT / DeepTE

problems interpreting output #6