Closed fantin-mesny closed 2 years ago
Dear Fantin, thank you very much for your interest in our software.
I guess the output files of repeat modeler have some issues.
Could you do me a favor and share the file "-families.stk" with me please?
As you can see in the example file of the repo, repeatmodeler should output the file in stockholm file format. https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE/blob/main/workspace/testProject/repeatmodel/sequence_index-families.stk
What I can do is to check the problematic files by myself and write a code to fix the certain lines that give you trouble.
To do so, Please share the file "-families.stk" from your problematic genome with me
Best regards, Kevin
Dear Kevin,
Please find attached a problematic repeatmodel output, causing the parsing error mentioned in my previous message.
Many thanks for your help!
Best wishes, Fantin
Dear Fantin, thanks for your answer!
I figured out the problem, for some reason your RepeatModeler returns a stockholm file with empty lines (e.g. Line 4878 in your given stk file). Please clean your stk files from empty lines with my little program as explained below. Hint: Make sure to make a copy of your files before applying my small script just for safety as a backup.
Explanation about script:
What I did: I wrote a small script that you could use to clean your stk files from empty lines. You can run this small program like that:
python corrector.py FROM_FILE.stk TO_FILE.stk
Please find my script correct.py attached. corrector.zip
Otherwise (if you are experienced with python) just use following code and save it to a file "corrector.py":
# Author: Kevin Riehl for Transposon Ultimate Problems with RepeatModeler Outputs C 2022
# This code loads annotation outputs from RepeatModeler in Stockholm format and erases empty lines
# as these casue errors in the downstream pipeline of reasonaTE
# Usage: python corrector.py FROM_FILE.stk TO_FILE.stk
# get arguments
import sys
arguments = sys.argv
print(arguments)
if(len(arguments)==3):
from_file = arguments[1]
to_file = arguments[2]
# read file and erase empty lines
f1 = open(from_file, "r")
f2 = open(to_file, "w+")
line = " "
last_line = " "
while line!="":
last_line = line
line = f1.readline()
if not (len(line.replace("\n",""))==0):
f2.write(line)
f1.close()
f2.close()
else:
print("ERROR! No two arguments given from_file and to_file given!")
Please let me know if this did the trick for you. Best regards, Kevin
Dear Kevin,
Many thanks for your help! Removing the empty lines in the Stockholm files fixed the parsing problem.
Maybe you should implement this script in the reasonaTE programme.
Best regards. Fantin
Dear Fantin, thank you very much for your feedback! I am happy we could help you with that issue.
We will consider to include this in our next release. However, we are also wondering why the tool repeatmodeler behaves differently as it shouldnt produce empty lines.
Best regards, Kevin Riehl
Dear Kevin,
Many thanks for developing TransposonUltimate.
I have used reasonaTE to run all the annotation tools on 70+ fungal genomes, and I am now proceeding to the parsing step.
For some genomes, it works without any problem. However, quite some genomes face the same issue when parsing the RepeatModeler output. Please see below
I tried rerunning reasonaTE with tool
repeatmodel
for the genomes showing this error during parsing, but this did not solve the issue.Below is the content of the "repeatmodel" directory:
Please let me know in case you know I do something wrong.
Best wishes, Fantin