LiLabAtVT / DeepTE

Neural network classification of TE
BSD 3-Clause "New" or "Revised" License
82 stars 7 forks source link

Percentage of classified sequences #4

Closed mylena-s closed 4 years ago

mylena-s commented 4 years ago

Hello! I'm running DeepTE on a server to classify repeat sequences from non-model species that I obtained with Repeatmodeler2 and that were classified as "Unknown" or "LTR/Unknown". When I run deepTE as specified in the readme file, no matter the parameters I change or the input I put, all sequences are classified in one or another TE class/family. This is an example of one of the commands I've tried:

/pathto/minicondaenv/bin/python /path/to/DeepTE/DeepTE.py -d working_dirLTR -o output_dirLTR -i Sp_LTRUnknown.fasta -sp M -m_dir /path/to/DeepTE/Metazoans_model > deepteUnknownLTR.txt 2>&1

I know my question might sound strange, but I was expecting that some sequences would remain unclasified (Since this is an non-model organisms and its frequent to have great amount of Unclassified sequences) and I do not know if thats a normal result of the program or if there is some problem with my enviroment configuration.

Also, I found some errors in the log file, although the program continued running.. I believe some of the errors are the result of running deepTE through a script summited as a PBS job deepte.log

Thanks in advanced!!

songliVT commented 4 years ago

Hello,

Thank you for your interest in our package. For your questions: (1) DeepTE is a supervised machine learning method, therefore, it can only predict categories that were used for training. We did not include a category called "unknown", therefore we cannot classify TEs as unknown. However, we can modify the package to include a score such that the lower the score is, the more unlikely this TE belongs to the predicted category.

(2) the error message seems fine, it's just some warning about CPU usage.

Song

On Tue, Jun 2, 2020 at 6:42 PM mylena-s notifications@github.com wrote:

Hello! I'm running DeepTE on a server to classify repeat sequences from non-model species that I obtained with Repeatmodeler2 and that were classified as "Unknown" or "LTR/Unknown". When I run deepTE as specified in the readme file, no matter the parameters I change or the input I put, all sequences are classified in one or another TE class/family. This is an example of one of the commands I've tried:

/pathto/minicondaenv/bin/python /path/to/DeepTE/DeepTE.py -d working_dirLTR -o output_dirLTR -i Sp_LTRUnknown.fasta -sp M -m_dir /path/to/DeepTE/Metazoans_model > deepteUnknownLTR.txt 2>&1

I know my question might sound strange, but I was expecting that some sequences would remain unclasified (Since this is an non-model organisms and its frequent to have great amount of Unclassified sequences) and I do not know if thats a normal result of the program or if there is some problem with my enviroment configuration.

Also, I found some errors in the log file, although the program continued running.. I believe some of the errors are the result of running deepTE through a script summited as a PBS job deepte.log https://github.com/LiLabAtVT/DeepTE/files/4720034/deepte.log

Thanks in advanced!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LiLabAtVT/DeepTE/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEEENV7ODHCKJEDX2MTIS3RUV533ANCNFSM4NREK2SA .

-- Assistant Professor in Plant Genomics and Bioinformatics School of Plant and Environmental Sciences Virginia Polytechnic Institute and State University

mylena-s commented 4 years ago

Thanks for the help and for the quick answer! I will follow the updates!

Cheers, Mylena