Open jebrosen opened 4 years ago
Ahh, sorry I didn't check this issue before. I did try to modify some files to trick the program, including moving both the $recoverDir/consensi.fa
and $recoverDir/round-6
to a different directory and restart the program. It still said the directory contains a successful run. So there might be something wrong with this functionality?
I believe moving round-6
somewhere else would make it look like the run had successfully finished after 5 rounds.
In addition to or instead of resuming, the clustering steps could also be copied or moved into a second program or command-line option. This would allow for combining together a plain RepeatModeler
run (without -LTRStruct
) and a LTRPipeline
run after the fact.
Since #80 is referencing this thread but the answer is actually in #80. I paste the solution from https://github.com/Dfam-consortium/RepeatModeler/issues/80#issuecomment-617265216 here;
For a workaround for now, try renaming only round-6/consensi.fa to round-6/consensi.fa.bak. This will make it appear like round 6 failed, and it will resume from there.
Hi, I met the same problem and renaming to round-6/consensi.fa.bak didn't work. I install RepeatModeler2.0.1 and Ninja v1.2.2 manually and other Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.0.9, GenomeTools 1.6.1, LTR_Retriever, MAFFT 7.310, CD-HIT 4.8.1 were installed by conda error message: "the /home/panpan/RepeatMasker_out/RM_12646.ThuMay211413112020/ run did not get passed round-1" even though tried renaming only round-6/consensi.fa to round-6/consensi.fa.bak, didn't help. How can I to run LTRPipeline alone with the exting output: round1-5?
@njaupan
Ninja v1.2.2
That is not the same NINJA that RepeatModeler depends on. The latest version is here: https://github.com/TravisWheelerLab/NINJA/releases/tag/0.97-cluster_only. That might be why LTRPipeline failed.
But I am not sure why it won't resume. Based on your description, it may be a similar issue to https://github.com/Dfam-consortium/RepeatModeler/issues/79#issuecomment-617278885 - can you check whether round 2 found any families?
Hi it is true that the issue #79 maybe explain, round 2 did not produce any families. Now I have configured 0.97-cluster_only Ninja, filled round-2/consensi.fa with a blank line. If recoverDir still does not work, I will return the problem.
Hi,
I think the origin of the "run did not get passed round-1" error from -recoverDir
could be from not having nseg in your PATH - so there is not a proper run of RepeatScout (I think due to it's dependency in RepeatScout's filter-stage-1.prl script).
Without nseg in my PATH I get an empty: sampleDB-1.fa.rscons.filtered
and I did not have consensi-refined.fa or consensi.fa, so RepeatModeler does not think Round-1 has been completed, and you cannot RecoverDir due to that. Even if all the RECON rounds run fine. For me, it is solved by installing and having nseg in my path.
I have ran RepeatModeler (with -LTRStruct
) to completion a few times and it works great. However, I have sometimes had my jobs timed out, and therefore crashing on the LTRpipeline.
To force the programme to continue, I added a -LTRcont
flag to the RepeatModeler script, which is basically a replica of -recoverDir
. It is basically the same up until checking what last successful round is, and then it then goes onto assign a value to numModels (using grep '>' consensi.fa | wc -l
), and does none of the back-up steps. I then added:
last if ( $options{'LTRcont'} );
at the beginning of the 'main loop' that runs RepeatScout and Recon, and it seems to continue the rest of the pipeline from the LTRStruct stage as desired. The results also seem to be fine. I am wondering if anything immediately stands out as problematic?
Thanks tkjmk, but all the files in round1 is not empty in my case. After configured 0.97-cluster_only Ninja and filled round-2/consensi.fa with a blank line, i met new problem that "LTRPipeline :
Error - could not open /home/panpan/TEanno_sensi/RepeatModeler_out/RM_23147.FriMay220914012020/LTR_17862.SunMay240545322020/clusters.dat! at /home/panpan/RepeatModeler/LTRPipeline line 325
where it indicatesMissing /home/panpan/anaconda3/envs/Repeatmodeler/bin/Libraries/RepeatMasker.lib.nsq!
But I configured RepeatMasker binary there and simply ran
makeblastdb -dbtype nucl -in RepeatMasker.lib
to generate the RepeatMasker.lib.nsq file.
and i found another RepeatMasker.lib.nsq in /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq
. Which path is exactly RepeatMasker binary?
Right now, there are three output in LTR_17862.SunMay240545322020: LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt Did LTRPipeline finish? I also attached the log file. log.txt
Hey,
Which path is exactly RepeatMasker binary?
If you downloaded RepeatMasker using conda in the RM_env, which it seems you have.
The binary should be in /home/panpan/anaconda3/envs/RM_env/bin
, but the RepeatMasker.lib should be in /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/
, from my experience.
I copied that Libraries/ to my equivalent of /home/panpan/anaconda3/envs/RM_env/bin/
and that worked for me.
You could alternatively just run RepeatClassifier after RepeatModeler completes by doing something like:
RepeatClassifier -consensi consensi.fa -stockholm families.stk -repeatmasker_dir <path_to_repeatmasker_with_librariesdir>
and that path in your case would be /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/
However, that RepeatClassifier error is separate from the LTR error. I believe the repo owners could answer this better, but here is what I would do.
i met new problem that "LTRPipeline
The clusters.dat error looks like there was a problem with the Ninja step in the LTRpipeline.
Maybe it's still a problem with the configuration/install of RepeatModeler with your new install of the correct Ninja version. I would try running RepeatModeler with -ninja_dir <path_to_ninja>
configuration override. You could ensure that the software is using the 0.97-cluster_only Ninja.
Did LTRPipeline finish?
The Ninja step is one of the last steps of the LTRpipeline, so it had almost ran to completion.
Many thanks tkjmk for your quick reply! I also think this 'RepeatMasker binary issue' is RepeatClassifier error which may separate from the LTR error. However, I am not sure how did the three output from LtrRetriever (LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt) work with existing consensi.fa? Did consensi.fa include LtrRetriever output?
Hi @njaupan, it looks like you have a few different issues here.
/home/panpan/anaconda3/envs/RM_env/bin
- have you provided RepeatModeler with the path to the bin
directory for the "RepeatMasker path"? This is not sufficient - it should be the .../share/RepeatMasker
directory. This could be the reason for the RepeatClassifier errors.Hi, I uninstalled everything and then configured all paths, it is working, thanks
Hi,
I think the origin of the "run did not get passed round-1" error from
-recoverDir
could be from not having nseg in your PATH - so there is not a proper run of RepeatScout (I think due to it's dependency in RepeatScout's filter-stage-1.prl script).Without nseg in my PATH I get an empty: sampleDB-1.fa.rscons.filtered
and I did not have consensi-refined.fa or consensi.fa, so RepeatModeler does not think Round-1 has been completed, and you cannot RecoverDir due to that. Even if all the RECON rounds run fine. For me, it is solved by installing and having nseg in my path.
I have ran RepeatModeler (with
-LTRStruct
) to completion a few times and it works great. However, I have sometimes had my jobs timed out, and therefore crashing on the LTRpipeline.To force the programme to continue, I added a
-LTRcont
flag to the RepeatModeler script, which is basically a replica of-recoverDir
. It is basically the same up until checking what last successful round is, and then it then goes onto assign a value to numModels (usinggrep '>' consensi.fa | wc -l
), and does none of the back-up steps. I then added:
last if ( $options{'LTRcont'} );
at the beginning of the 'main loop' that runs RepeatScout and Recon, and it seems to continue the rest of the pipeline from the LTRStruct stage as desired. The results also seem to be fine. I am wondering if anything immediately stands out as problematic?
I'd like to suggest including this improvement (if there are no issues with it) to the program because I've had the same issue more than a couple of times. Thanks @tkjmk!
I'd like to suggest including this improvement (if there are no issues with it) to the program because I've had the same issue more than a couple of times. Thanks @tkjmk!
I support this notion.
Additionally, it would be great to include a RepeatModeler flag for $ltrSeqLimit modification in LTRPipeline, which could help to address issues like #96.
If all rounds of RepeatScout + RECON succeed but LTRPipeline fails, then using the
-recoverDir
option will claim that the directory contains a successful run. Since merging the results of the two methods can't easily be done separately, it would be valuable to be able to use-recoverDir
to skip to theLTRPipeline
+ clustering steps.