recoverDir can not recover from a failed run of the LTRPipeline

jebrosen commented 4 years ago

If all rounds of RepeatScout + RECON succeed but LTRPipeline fails, then using the -recoverDir option will claim that the directory contains a successful run. Since merging the results of the two methods can't easily be done separately, it would be valuable to be able to use -recoverDir to skip to the LTRPipeline + clustering steps.

oushujun commented 4 years ago

Ahh, sorry I didn't check this issue before. I did try to modify some files to trick the program, including moving both the $recoverDir/consensi.fa and $recoverDir/round-6 to a different directory and restart the program. It still said the directory contains a successful run. So there might be something wrong with this functionality?

jebrosen commented 4 years ago

I believe moving round-6 somewhere else would make it look like the run had successfully finished after 5 rounds.

jebrosen commented 4 years ago

In addition to or instead of resuming, the clustering steps could also be copied or moved into a second program or command-line option. This would allow for combining together a plain RepeatModeler run (without -LTRStruct) and a LTRPipeline run after the fact.

oushujun commented 4 years ago

Since #80 is referencing this thread but the answer is actually in #80. I paste the solution from https://github.com/Dfam-consortium/RepeatModeler/issues/80#issuecomment-617265216 here;

For a workaround for now, try renaming only round-6/consensi.fa to round-6/consensi.fa.bak. This will make it appear like round 6 failed, and it will resume from there.

njaupan commented 4 years ago

Hi, I met the same problem and renaming to round-6/consensi.fa.bak didn't work. I install RepeatModeler2.0.1 and Ninja v1.2.2 manually and other Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.0.9, GenomeTools 1.6.1, LTR_Retriever, MAFFT 7.310, CD-HIT 4.8.1 were installed by conda error message: "the /home/panpan/RepeatMasker_out/RM_12646.ThuMay211413112020/ run did not get passed round-1" even though tried renaming only round-6/consensi.fa to round-6/consensi.fa.bak, didn't help. How can I to run LTRPipeline alone with the exting output: round1-5?

jebrosen commented 4 years ago

@njaupan

Ninja v1.2.2

That is not the same NINJA that RepeatModeler depends on. The latest version is here: https://github.com/TravisWheelerLab/NINJA/releases/tag/0.97-cluster_only. That might be why LTRPipeline failed.

But I am not sure why it won't resume. Based on your description, it may be a similar issue to https://github.com/Dfam-consortium/RepeatModeler/issues/79#issuecomment-617278885 - can you check whether round 2 found any families?

njaupan commented 4 years ago

Hi it is true that the issue #79 maybe explain, round 2 did not produce any families. Now I have configured 0.97-cluster_only Ninja, filled round-2/consensi.fa with a blank line. If recoverDir still does not work, I will return the problem.

tkjmk commented 4 years ago

Hi,

I think the origin of the "run did not get passed round-1" error from -recoverDir could be from not having nseg in your PATH - so there is not a proper run of RepeatScout (I think due to it's dependency in RepeatScout's filter-stage-1.prl script).

Without nseg in my PATH I get an empty: sampleDB-1.fa.rscons.filtered

and I did not have consensi-refined.fa or consensi.fa, so RepeatModeler does not think Round-1 has been completed, and you cannot RecoverDir due to that. Even if all the RECON rounds run fine. For me, it is solved by installing and having nseg in my path.

I have ran RepeatModeler (with -LTRStruct) to completion a few times and it works great. However, I have sometimes had my jobs timed out, and therefore crashing on the LTRpipeline.

To force the programme to continue, I added a -LTRcont flag to the RepeatModeler script, which is basically a replica of -recoverDir. It is basically the same up until checking what last successful round is, and then it then goes onto assign a value to numModels (using grep '>' consensi.fa | wc -l), and does none of the back-up steps. I then added:

last if ( $options{'LTRcont'} ); at the beginning of the 'main loop' that runs RepeatScout and Recon, and it seems to continue the rest of the pipeline from the LTRStruct stage as desired. The results also seem to be fine. I am wondering if anything immediately stands out as problematic?

njaupan commented 4 years ago

Thanks tkjmk, but all the files in round1 is not empty in my case. After configured 0.97-cluster_only Ninja and filled round-2/consensi.fa with a blank line, i met new problem that "LTRPipeline :

Error - could not open /home/panpan/TEanno_sensi/RepeatModeler_out/RM_23147.FriMay220914012020/LTR_17862.SunMay240545322020/clusters.dat! at /home/panpan/RepeatModeler/LTRPipeline line 325

where it indicatesMissing /home/panpan/anaconda3/envs/Repeatmodeler/bin/Libraries/RepeatMasker.lib.nsq!

But I configured RepeatMasker binary there and simply ran makeblastdb -dbtype nucl -in RepeatMasker.lib to generate the RepeatMasker.lib.nsq file. and i found another RepeatMasker.lib.nsq in /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq. Which path is exactly RepeatMasker binary?

Right now, there are three output in LTR_17862.SunMay240545322020: LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt Did LTRPipeline finish? I also attached the log file. log.txt

tkjmk commented 4 years ago

Hey,

Which path is exactly RepeatMasker binary?

If you downloaded RepeatMasker using conda in the RM_env, which it seems you have. The binary should be in /home/panpan/anaconda3/envs/RM_env/bin, but the RepeatMasker.lib should be in /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/, from my experience. I copied that Libraries/ to my equivalent of /home/panpan/anaconda3/envs/RM_env/bin/ and that worked for me.

You could alternatively just run RepeatClassifier after RepeatModeler completes by doing something like: RepeatClassifier -consensi consensi.fa -stockholm families.stk -repeatmasker_dir <path_to_repeatmasker_with_librariesdir> and that path in your case would be /home/panpan/anaconda3/envs/RM_env/share/RepeatMasker/Libraries/

However, that RepeatClassifier error is separate from the LTR error. I believe the repo owners could answer this better, but here is what I would do.

i met new problem that "LTRPipeline

The clusters.dat error looks like there was a problem with the Ninja step in the LTRpipeline. Maybe it's still a problem with the configuration/install of RepeatModeler with your new install of the correct Ninja version. I would try running RepeatModeler with -ninja_dir <path_to_ninja> configuration override. You could ensure that the software is using the 0.97-cluster_only Ninja.

Did LTRPipeline finish?

The Ninja step is one of the last steps of the LTRpipeline, so it had almost ran to completion.

njaupan commented 4 years ago

Many thanks tkjmk for your quick reply! I also think this 'RepeatMasker binary issue' is RepeatClassifier error which may separate from the LTR error. However, I am not sure how did the three output from LtrRetriever (LtrRetriever-redundant-results.fa, mafft-alignment.fa and raw-struct-results.txt) work with existing consensi.fa? Did consensi.fa include LtrRetriever output?

jebrosen commented 4 years ago

Hi @njaupan, it looks like you have a few different issues here.

/home/panpan/anaconda3/envs/RM_env/bin - have you provided RepeatModeler with the path to the bin directory for the "RepeatMasker path"? This is not sufficient - it should be the .../share/RepeatMasker directory. This could be the reason for the RepeatClassifier errors.
It looks like RepeatMasker is still not configured with the correct path for NINJA.

njaupan commented 4 years ago

Hi, I uninstalled everything and then configured all paths, it is working, thanks

Astahlke commented 2 years ago

Hi,

I think the origin of the "run did not get passed round-1" error from -recoverDir could be from not having nseg in your PATH - so there is not a proper run of RepeatScout (I think due to it's dependency in RepeatScout's filter-stage-1.prl script).

Without nseg in my PATH I get an empty: sampleDB-1.fa.rscons.filtered

and I did not have consensi-refined.fa or consensi.fa, so RepeatModeler does not think Round-1 has been completed, and you cannot RecoverDir due to that. Even if all the RECON rounds run fine. For me, it is solved by installing and having nseg in my path.

I have ran RepeatModeler (with -LTRStruct) to completion a few times and it works great. However, I have sometimes had my jobs timed out, and therefore crashing on the LTRpipeline.

To force the programme to continue, I added a -LTRcont flag to the RepeatModeler script, which is basically a replica of -recoverDir. It is basically the same up until checking what last successful round is, and then it then goes onto assign a value to numModels (using grep '>' consensi.fa | wc -l), and does none of the back-up steps. I then added:

last if ( $options{'LTRcont'} ); at the beginning of the 'main loop' that runs RepeatScout and Recon, and it seems to continue the rest of the pipeline from the LTRStruct stage as desired. The results also seem to be fine. I am wondering if anything immediately stands out as problematic?

I'd like to suggest including this improvement (if there are no issues with it) to the program because I've had the same issue more than a couple of times. Thanks @tkjmk!

gc-content commented 6 days ago

I'd like to suggest including this improvement (if there are no issues with it) to the program because I've had the same issue more than a couple of times. Thanks @tkjmk!

I support this notion.

Additionally, it would be great to include a RepeatModeler flag for $ltrSeqLimit modification in LTRPipeline, which could help to address issues like #96.

Dfam-consortium / RepeatModeler

recoverDir can not recover from a failed run of the LTRPipeline #65