Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
182 stars 23 forks source link

Error in Clustering Step of LTR Pipeline #241

Open simone-says opened 3 months ago

simone-says commented 3 months ago

Describe the issue

Trying to run the LTR Pipeline alone to add to some libraries and then re-mask some genomes. So far, this is only happening with one genome. I use the -LTRStruc flag for RepeatModeler runs with no issues, not sure what the issue is here.

Reproduction steps

My exact commands are: srun apptainer exec --bind=/projects:/projects /common/contrib/containers/tetools-v1.88.sif LTRPipeline ${species_name}.genome.fa -threads 40

I don't know how to reproduce this exactly, I ran it twice when it failed and got the same error message. This is the genome that's giving an error: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_025583915.1/

Log output

Running LtrHarvest...     : 01:05:25 (hh:mm:ss) Elapsed Time
JANCLY010000001 is not in /projects/tollis_lab/busco_phylo/squamates/ref/omes/anolisSagrei.genome.fa.2bit
Running Ltr_retriever...  : 00:46:13 (hh:mm:ss) Elapsed Time
Aligning instances...     : 00:09:39 (hh:mm:ss) Elapsed Time
Clustering...LTRPipeline: Error - could not cluster MAFFT results.
             : 00:00:01 (hh:mm:ss) Elapsed Time
LTRPipeline : Error - could not open /projects/tollis_lab/busco_phylo/squamates/ref/omes/LTR_3321156.MonMar251237332024/clusters.dat! at /opt/RepeatModeler/LTRPipeline line 333.
srun: error: cn8: task 0: Exited with exit code 2

Environment (please include as much of the following information as you can find out): Using TETools apptainer on Slurm HPC

rmhubley commented 2 weeks ago

Sorry for the delay. If you still have these files, could you check that "JANCLY010000001" is in fact a sequence in your input file:

% fgrep JANCLY010000001 ${species_name}.genome.fa

and that it also occurs in the twobit file:

% twoBitInfo /projects/tollis_lab/busco_phylo/squamates/ref/omes/anolisSagrei.genome.fa.2bit stdout | grep JANCLY010000001

When I download the assembly from the link you provided (GCF_025583915.1_AnoSag2.1_genomic.fna.gz) I do not see a sequence named JANCLY010000001. Did you alter the assembly in any way, or get it from a different source?