DerKevinRiehl / transposon_annotation_reasonaTE

Transposon annotation tool "resonaTE" (part of TransposonUltimate)
GNU General Public License v3.0
16 stars 1 forks source link

Not complete results #12

Closed roperete closed 2 years ago

roperete commented 2 years ago

Hello,

I attempt to run reasonaTE with the provided test sequence, or my sequences, and I always get these results. I expect ltrPred not to be complete because I have not installed it yet, but the rest should be installed OK (Installation using conda and mamba).

Any clue on what could possibly be wrong? Any help will be much appreciated.

(transposon_annotation_tools_env) alvaro@alive:~:reasonaTE -mode checkAnnotations -projectFolder workspace -projectName testProject Checking helitronScanner ... completed Checking ltrHarvest ... completed Checking ltrPred ... not completed Checking mitefind ... completed Checking mitetracker ... not completed Checking must ... not completed Checking repeatmodel ... not completed Checking repMasker ... not completed Checking sinefind ... completed Checking sinescan ... not completed Checking tirvish ... completed Checking transposonPSI ... not completed Checking NCBICDD1000 ... not completed

Kind regards,

Álvaro

roperete commented 2 years ago

This is what I get upon running reasonaTE with -tool all

alvaro@mutant32:~:conda activate transposon_annotation_tools_env (transposon_annotation_tools_env) alvaro@mutant32:~:reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool all 7 ['scanHead', '-g', '/home/alvaro/workspace/testProject/sequence.fasta', '-bs', '0', '-o', '/home/alvaro/workspace/testProject/helitronScanner/scanHead.txt'] @@@ scanHead in >seq1 [Mon Apr 25 16:24:12 CEST 2022] [1:793080] [Mon Apr 25 16:24:12 CEST 2022] scanHead in >seq2 [Mon Apr 25 16:24:15 CEST 2022] [1:504300] [Mon Apr 25 16:24:15 CEST 2022]

All finished. [Mon Apr 25 16:24:17 CEST 2022]

scanTail in >seq1 [Mon Apr 25 16:24:17 CEST 2022] [1:793080] [Mon Apr 25 16:24:17 CEST 2022] scanTail in >seq2 [Mon Apr 25 16:24:21 CEST 2022] [1:504300] [Mon Apr 25 16:24:21 CEST 2022]

All finished. [Mon Apr 25 16:24:23 CEST 2022]

7 ['scanHead', '-g', '/home/alvaro/workspace/testProject/sequence_rc.fasta', '-bs', '0', '-o', '/home/alvaro/workspace/testProject/helitronScanner_rc/scanHead.txt'] @@@ scanHead in >seq1 [Mon Apr 25 16:24:24 CEST 2022] [1:793080] [Mon Apr 25 16:24:24 CEST 2022] scanHead in >seq2 [Mon Apr 25 16:24:28 CEST 2022] [1:504300] [Mon Apr 25 16:24:28 CEST 2022]

All finished. [Mon Apr 25 16:24:29 CEST 2022]

scanTail in >seq1 [Mon Apr 25 16:24:30 CEST 2022] [1:793080] [Mon Apr 25 16:24:30 CEST 2022] scanTail in >seq2 [Mon Apr 25 16:24:34 CEST 2022] [1:504300] [Mon Apr 25 16:24:34 CEST 2022]

All finished. [Mon Apr 25 16:24:36 CEST 2022]

sh: 1: gt: not found sh: 1: gt: not found /home/alvaro/anaconda3/envs/transposon_annotation_tools_env/bin/transposon_annotation_tools_mitefinderii/miteFinder_linux_x64 -pattern_scoring /home/alvaro/anaconda3/envs/transposon_annotation_tools_env/bin/transposon_annotation_tools_mitefinderii/pattern_scoring.txt -input /home/alvaro/workspace/testProject/sequence.fasta -output /home/alvaro/workspace/testProject/mitefind/result.txt

Sequence 0: 0

Sequence 1: 0

##############

The program cost 2 seconds totally to search for MITEs.

/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/bin/transposon_annotation_tools_mitefinderii/miteFinder_linux_x64 -pattern_scoring /home/alvaro/anaconda3/envs/transposon_annotation_tools_env/bin/transposon_annotation_tools_mitefinderii/pattern_scoring.txt -input /home/alvaro/workspace/testProject/sequence_rc.fasta -output /home/alvaro/workspace/testProject/mitefind_rc/result.txt

Sequence 0: 0

Sequence 1: 0

##############

The program cost 2 seconds totally to search for MITEs.

sh: 1: mitetracker: not found sh: 1: mitetracker: not found Traceback (most recent call last): File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/TransposonAnnotator.py", line 86, in runAnnotation(arg1, arg2, arg3, arg4) File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/AnnotationCommander.py", line 150, in runAnnotation runMust(projectFolderPath, "") File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/AnnotationCommander.py", line 32, in runMust os.mkdir(os.path.join(projectFolderPath,"must","temp")) OSError: [Errno 17] File exists: '/home/alvaro/workspace/testProject/must/temp' (transposon_annotation_tools_env) alvaro@mutant32:~:reasonaTE -mode checkAnnotations -projectFolder workspace -projectName testProject Checking helitronScanner ... completed Checking ltrHarvest ... completed Checking ltrPred ... not completed Checking mitefind ... completed Checking mitetracker ... not completed Checking must ... not completed Checking repeatmodel ... not completed Checking repMasker ... not completed Checking sinefind ... completed Checking sinescan ... not completed Checking tirvish ... completed Checking transposonPSI ... not completed Checking NCBICDD1000 ... not completed

DerKevinRiehl commented 2 years ago

Dear Álvaro, first of all thank you very much for your interest in our software.

To better help you, it might be great if you could call reasonate and run specific tools, and share the console output with us. For example

reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool mitetracker
reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool must
reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool sinescan
reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool transposonPSI
reasonaTE -mode annotate -projectFolder workspace -projectName testProject -tool NCBICDD1000

Could you moreover please answer following questions?

Looking forward to your answers to best help and support you in using resonaTE, Best, Kevin Riehl

roperete commented 2 years ago

Dear Kevin, Thanks for your response and will to help.

reasonaTE Test Console Output.txt

roperete commented 2 years ago

I think it may be relevant that during the installation of the failing tools (proteinncbicdd1000, transposonpsicli. mitetracker, mustv2, sinescan) this error is prompt: Installation Glibc error.txt

DerKevinRiehl commented 2 years ago

Dear Álvaro, thanks for the detailed answer.

Indeed, I agree it seems the issue that the software could not be installed properly on your Linux System (Ubuntu).

One of the challenges, when working with Conda and Mamba is that the underlying package management systems and the servers are dynamically changing and probably dont cause the same behavior like for me.

Could you please try to create a new conda environment, try just to install transposon annotation tools, with conda if possible, and run the genomes with the tools separately in another folder. If this worked, I will explain you how to copy paste their results into your reasonaTE project folder to proceed.

Best and good luck, Kevin

roperete commented 2 years ago

Dear Kevin,

I did circumvent all the issues by uninstalling and re-installing everything. For any possible user that finds similar problems:

I added bioconda, conda-forge and derkevinriehl as default channels for conda conda config --add channels new_channel instead of specifying the channel in the installing command. I used the plain conda option, as well.

However, RepeatModeler did not correctly work, so I had to install it from the original source and run it separately.

Therefore, I would be really grateful if you would explain me how to copy paste the results for RepeatModeler2 into my reasonaTE folder to proceed.

Thanks a bunch!

DerKevinRiehl commented 2 years ago

Dear Álvaro, great to hear back from you :-)!

I am happy to hear that reinstalling did the trick. Even though we tried our best making the packages, the package management system server (conda) is constantly changing and therefore sometimes causing these kind of issues.

To your question: In general you can orient yourself to the folder structure of the sample project of reasonaTE. In the folder "repeatmodel" you should have following files (=output from RepeatModeler). https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/tree/main/workspace/testProject/repeatmodel

As you might see in the code of reasonaTE Line 1213: https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/blob/cc04a2db30c98f21981eb2d90710887be726bfcc/Code/AnnotationParser.py#L1208 The only file important from reasonaTE from RepeatModeler output is the file "sequence_index-families.stk".

So all you need to do is copy your *-families.stk file to the folder "repeatmodel" in your project folder.

Then you can check if reasonaTE can find the annotations by: reasonaTE -mode checkAnnotations -projectFolder workspace -projectName testProject

Afterwards you can proceed with the parsing step. reasonaTE -mode parseAnnotations -projectFolder workspace -projectName testProject

Please let me know if you face any more issues, I am happy to help, and sure that you are almost there :-) with successfully using our tool.

Best, Kevin

roperete commented 2 years ago

Dear Kevin,

Thanks for your quick reply and help.

I have indeed found the file (merely called families.stk) amongst the other result files from RepeatModeler. After copying it to the folder, it checks the annotation as complete, but the parsing gives the following error:

Parse repeatModeler... Traceback (most recent call last): File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/TransposonAnnotator.py", line 114, in <module> parseAvailableResults(projectFolderPath) File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/AnnotationParser.py", line 1346, in parseAvailableResults parseRepeatModeler(pathResDir, fastaFile, targetGFFFile, targetGFFrepe, targetFastaFile) File "/home/alvaro/anaconda3/envs/transposon_annotation_tools_env/share/TransposonAnnotator_reasonaTE/AnnotationParser.py", line 1255, in parseRepeatModeler seqTypeLabelA = seqType.split(";")[1] IndexError: list index out of range

I did run RepeatModeler2 with -LTRStruct activated. Should I not do that when intending to run the results through the pipeline?

Thanks :)

DerKevinRiehl commented 2 years ago

Dear Álvaro, seems you hit the exact same problem like someone before. The issue is, that RepeatModeler is sometimes producing empty lines in the stockholm file, even though this violates the file standard. I wrote a small script that you can use to "clean" your stockholm files, as described in this thread: https://github.com/DerKevinRiehl/TransposonUltimate/issues/3#issuecomment-1117307262

Please let me know if this did the trick. Best, Kevin

roperete commented 2 years ago

Dear Kevin,

The program returns an empty .stk file as an output. It seems to delete everything.

Best, Alvaro

DerKevinRiehl commented 2 years ago

Dear Álvaro, could you please share your original STK file, it seems your version of RepeatModeler has yet another output format than the one I dealt with in the github issue mentioned before.

I will have a look on your STK file and adopt the script.

Thanks, Best regards, Kevin

roperete commented 2 years ago

Dear Kevin,

I hereby attach the families file. But the problem might not be your program but my RepeatModeler.

This file is not the -families.stk that should be the normal output of the program. Those files are not being produced at all by RepeatModeler, for a reason I dont know.

The file attached (and the one I was attempting to use) is the families.stk file inside the RM_ folder generated by RepeatModeler containing all the temporal files from the run. families.zip

Thanks, Kind regards, Álvaro

DerKevinRiehl commented 2 years ago

Dear Álvaro, after checking the error message again, I found a reason for this behavior. The stockholm file does not contain the repeat type. Normally Repeatmodeler should write "Unknown" but in your case it simply didnt add any information to the file.

Therefore I wrote a new corrector, as you can find below. Please use this one, and tell me if it did the trick. I attached the updated corrector.py as well as your updated stk file families.stk.

Please use following code below:

python corrector.py FROM_FILE.stk TO_FILE.stk

Here is the code of the corrector:

# Author: Kevin Riehl for Transposon Ultimate Problems with RepeatModeler Outputs C 2022

# This code loads annotation outputs from RepeatModeler in Stockholm format that misses repeat type, and adds it,
# as these missing values cause errors in the downstream pipeline of reasonaTE

# Usage: python corrector.py FROM_FILE.stk TO_FILE.stk

# get arguments
import sys
arguments = sys.argv
print(arguments)
if(len(arguments)==3):
    from_file = arguments[1]
    to_file = arguments[2]

    # read file and erase empty lines
    f1 = open(from_file, "r")
    f2 = open(to_file, "w+")

    line = " "
    last_line = " "
    ctr = 0
    while line!="":
        last_line = line
        line = f1.readline()
        if not (len(line.replace("\n",""))==0):
            if(line.startswith("# STOCKHOLM")):
               line = line + "#=GF TP    Unknown;Unknown\n"
            f2.write(line)  
        else:
            print(ctr)
        ctr+=1
    f1.close()
    f2.close()

else:
    print("ERROR! No two arguments given from_file and to_file given!")

Please let me know if this did the trick for you. Best regards, Kevin

roperete commented 2 years ago

Dear Kevin, The updated corrector does indeed the trick. On the time being, I figured why I was not obtaining the right outputs. All is working now with RepeatModeler and RepeatMasker.

Thanks a lot for your interest and help. Very much appreciated!!!