Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
182 stars 23 forks source link

LTR_retriever failed to generate a file #230

Open cyycyj opened 6 months ago

cyycyj commented 6 months ago

Dear Robert, Jeb and Francisco,

Describe the issue

I am working on a plant genome (~500Mb, het=1.29%). As the title mentioned, when I run RepeatModeler, LTR_retriever failed to generate a file. I am not sure if it is the same issue as you pinned https://github.com/Dfam-consortium/RepeatModeler/issues/202.

Reproduction steps

Please kindly find the files as below, and I have to convert them into txt files to meet the attachment rule of github. Let me clarify them:

01.repeatmodeler.sh.txt: the slurm script I submitted. 01.repeatmodeler1.out.txt & 01.repeatmodeler1.err.txt: std and err output of slurm. LTR_retriever.log: LTR_retriever log mentioned in01.repeatmodeler1.err.txt LTR.identifier.pl.txt: the error script mentioned in LTR_retriever.log

Log output

01.repeatmodeler.sh.txt 01.repeatmodeler1.out.txt 01.repeatmodeler1.err.txt LTR_retriever.log LTR.identifier.pl.txt

Environment (please include as much of the following information as you can find out):

manual installation from repeatmasker.org

RepeatModeler-2.0.5

RepeatMasker-4.1.6, dfam38_full.0.h5+dfam38_full.5.h5.gz+RepBaseRepeatMaskerEdition-20181026.tar.gz. You could find detail information on https://github.com/rmhubley/RepeatMasker/issues/238

Linux cln01 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

rmhubley commented 6 months ago

Thanks for this wonderfully detailed bug report. You identified the cause of the error:

Invalid value for shared scalar at /data/miniconda3/envs/repeat/share/LTR_retriever/bin/LTR.identifier.pl line 114, line 10083.

Which is a problem with the LTR_retriever program -- related to data passing in a multithreaded run. Please additionally report it here: https://github.com/oushujun/LTR_retriever so the authors can be aware of the issue. I would highly recommend avoiding conda when using RepeatModeler/RepeatMasker (and the dependencies) as we have had nothing but problems with bad recipes, and mismatched dependencies. It is a bit strange that in your RepeatModeler log output (*.out.txt), LTR_retriever is the only dependency for which RepeatModeler couldn't ascertain it's version.

Search Engine = rmblast 2.14.1+
Threads = 128
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever ,
                                   Ninja 0.97-cluster_only, MAFFT 7.520,
                                   CD-HIT 4.8.1 )

Love the 128 threads by the way.....heavy metal!

Another thing you can try is to run LTR_retriever on its own to see if you can reproduce this without having to go through the trouble of running inside of RepeatModeler. Simply run:

% /data/miniconda3/envs/repeat/share/LTR_retriever/bin/LTR_retriever -repeatmasker /data/biosoft/RepeatMasker -blastplus /data/miniconda3/envs/repeat/bin -cdhit_path /data/miniconda3/envs/repeat/bin -trf_path /data/miniconda3/envs/repeat/bin/trf -genome seq.fa -inharvest /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_202283.ThuDec71609352023/LTR_217736.FriDec80530182023/raw-struct-results.txt -noanno -threads 128

This is what you would need to do to report this to the LTR_retriever group, and maybe even provide them with that raw-struct-results.txt file.

For this RepeatModeler run, you should know that the bulk of the results are still intact even if this step fails. It looks like RepeatModeler found 2,117 families (LTRs included). Depending on your use-case, you should be able to use this as a starting point for library curation, genome masking etc and rerun just the LTR structural finding and merge results at a later time. To run just that portion of the analysis you would do:

%  <RepeatModeler Directory>/LTRPipeline -threads 128  P-2.primary.fa
cyycyj commented 6 months ago

Dear Robert,

Thank you for your detailed and quick reply! I found that this issue may have come with the error in the conda distribution of LTR_retriever, as others have also encountered a similar issue before (https://github.com/oushujun/LTR_retriever/issues/159). I have reinstalled it manually, and this time RepeatMasker seems to find LTR_retriever's version correctly.

RepeatModeler Version 2.0.5
===========================
Using output directory = /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_114078.SatDec91134272023
Search Engine = rmblast 2.14.1+
Threads = 128
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever v2.9.5,
                                   Ninja 0.97-cluster_only, MAFFT 7.520,
                                   CD-HIT 4.8.1 )

I am also updating this issue on LTR_retriever (https://github.com/oushujun/LTR_retriever/issues/160), and I am working on reproducing this as you mentioned above. Once there is something new, I will update you.

What's more, for HPC users like me, we do not have root access for installing software/packages, so using conda to build dependencies might be a good choice.

By the way, I also think 128 threads is heavy metal, but I love Britpop like Coldplay, lol

cyycyj commented 6 months ago

Oops! Somthing new happened, and the slurm work terminated in round5.

...
FATAL ERROR: RepeatModeler giving up. One or more
batches failed!  Unfortunately this type of error
cannot be recovered from. Please submit the following
details to the feedback page at the repeatmasker
website:

       http://www.repeatmasker.org

RepeatModeler Version: 2.0.5
Search Engine: rmblast [ 2.14.1+ ]
Command Line: /data/biosoft/RepeatModeler-2.0.5/RepeatModeler-database /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/00.BuildDatabase/P-2.primary -threads 128 -LTRStruct
Batch Number: 2981
Disk Space:
Filesystem          1K-blocks           Used     Available Use% Mounted on
/wfbdnxy       16447657790160 10574408071604 5873249718556  65% /data

System Memory:
Further details about this problem may be found in
the directory: /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_114078.SatDec91134272023

I am thinking about it may be memory issue. Would you mind give me a email adress so that I can share you the original round5 folder?

simone-says commented 2 months ago

I'm having an issue with the clustering step of the LTRPipeline, but I re-ran an older version not in TE Tools container but in a RepeatModeler Singularity container and it worked. How can I merge the RepeatModeler and LTRPipeline results before running RepeatMasker?