Open sahoo-rk opened 8 months ago
Hi Haidong,
What would you suggest in this case?
Song
On Tue, Nov 21, 2023, 1:05 AM Ranjit Kumar Sahoo @.***> wrote:
Hello: Thanks for developing this interesting tool. While I was trying to re-classify the TE library of an insect species, I observe certain level of discrepancy between curated/denovo classification and that from the DeepTE. Below mentioned the snapshot of such occurrences. In the Case 1, as the library was derived from the curated databases, dfam/sinebase, I was expecting the DeepTE to identify the same classification. In contrast, DeepTE provides a complete different or a higher-level classification. The scenario is similar in Case 2 for denovo predictions. How to proceed in such instances? Please suggest.
Case 1: sinebase#SINE/Unknown ClassI dfam#LINE/Unknown ClassI dfam#LINE/Unknown ClassI_LTR_Gypsy
Case 2: TE_00003106_INT#LTR/Gypsy ClassI TE_00004093_INT#LTR/Copia ClassI TE_00003851_LTR#LTR/Gypsy unknown
NB: DeepTE was executed with the supplied metazoan model and the classification of the TE library of 10K sequences was completed in 6mins only. Best,
— Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/30, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENQTXPKQILG66A3WBZLYFRADFAVCNFSM6AAAAAA7UACCYGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDGNJSHA4DGNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
See Haidong's reply (he is the first author of the paper and he moved to a different position):
DeepTE may not perform 100% perfect for the classification due to incomplete training materials or sort of overfitting. For example, the initial training materials I used may not cover the curated databases you mentioned, which may allow the models cannot learn the patterns from the databases. Would you mind testing how often these cases occurred? I guess most of the curated classifications would be captured. A seondary choice is that you could use the 'training_example_dir' in the DeepTE github to have a new training based on the new curated databases, which may help to solve this issue.
Best wishes, Haidong
On Tue, Nov 21, 2023 at 1:05 AM Ranjit Kumar Sahoo @.***> wrote:
Hello: Thanks for developing this interesting tool. While I was trying to re-classify the TE library of an insect species, I observe certain level of discrepancy between curated/denovo classification and that from the DeepTE. Below mentioned the snapshot of such occurrences. In the Case 1, as the library was derived from the curated databases, dfam/sinebase, I was expecting the DeepTE to identify the same classification. In contrast, DeepTE provides a complete different or a higher-level classification. The scenario is similar in Case 2 for denovo predictions. How to proceed in such instances? Please suggest.
Case 1: sinebase#SINE/Unknown ClassI dfam#LINE/Unknown ClassI dfam#LINE/Unknown ClassI_LTR_Gypsy
Case 2: TE_00003106_INT#LTR/Gypsy ClassI TE_00004093_INT#LTR/Copia ClassI TE_00003851_LTR#LTR/Gypsy unknown
NB: DeepTE was executed with the supplied metazoan model and the classification of the TE library of 10K sequences was completed in 6mins only. Best,
— Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/30, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENQTXPKQILG66A3WBZLYFRADFAVCNFSM6AAAAAA7UACCYGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDGNJSHA4DGNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Associate Professor in Plant Genomics and Bioinformatics School of Plant and Environmental Sciences Virginia Polytechnic Institute and State University
Hello: Thanks for developing this interesting tool. While I was trying to re-classify the TE library of an insect species, I observe certain level of discrepancy between curated/denovo classification and that from the DeepTE. Below mentioned the snapshot of such occurrences. In the Case 1, as the library was derived from the curated databases, dfam/sinebase, I was expecting the DeepTE to identify the same classification. In contrast, DeepTE provides a complete different or a higher-level classification. The scenario is similar in Case 2 for denovo predictions. How to proceed in such instances? Please suggest.
Case 1: sinebase#SINE/Unknown ClassI dfam#LINE/Unknown ClassI dfam#LINE/Unknown ClassI_LTR_Gypsy
Case 2: TE_00003106_INT#LTR/Gypsy ClassI TE_00004093_INT#LTR/Copia ClassI TE_00003851_LTR#LTR/Gypsy unknown
NB: DeepTE was executed with the supplied metazoan model and the classification of the TE library of 10K sequences was completed in 6mins only. Best,