Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
184 stars 23 forks source link

Refiner did not return a consensus error #249

Closed athenasyarifa closed 1 month ago

athenasyarifa commented 1 month ago

Describe the issue

I am running RepeatModeler v.2.0.5 and saw a warning sign like the following:

RepeatModeler Round # 1
Searching for Repeats
 -- Sampling from the database...
Gathering up to 40000000 bp
Final Sample Size = 40040308 bp ( 40034033 non ambiguous )
Num Contigs Represented = 67
Sequence extraction : 00:01:02 (hh:mm:ss) Elapsed Time
 -- Running RepeatScout on the sequences...
RepeatScout: Running build_lmer_table ( l = 14 )..
RepeatScout: Running RepeatScout.. : 663 raw families identified
RepeatScout: Running filtering stage.. 585 families remaining
RepeatScout: 00:10:12 (hh:mm:ss) Elapsed Time
Large Satellite Filtering.. : 7 found in 00:00:16 (hh:mm:ss) Elapsed Time
Collecting repeat instances...: 00:05:32 (hh:mm:ss) Elapsed Time
WARNING: Retrying job ( 2 ) [ 255 ]...
WARNING: Retrying job ( 0 ) [ 255 ]...
WARNING: Retrying job ( 1 ) [ 255 ]...
WARNING: Retrying job ( 3 ) [ 255 ]...
Refinement: 00:00:00 (hh:mm:ss) Elapsed Time
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-73.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-45.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-141.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-164.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-139.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-95.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-175.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-69.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-32.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-128.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-10.fa.
  WARNING: Refiner did not return a consensus for /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-52.fa.
.
.
.
.
Family Refinement: 00:00:01 (hh:mm:ss) Elapsed Time
Round Time: 00:17:06 (hh:mm:ss) Elapsed Time : 0 families discovered.

the run continues to several rounds now but consensi.fa within the main RM folder file remains empty, although consensi.fa file inside round-1 folder contained sequences.

Reproduction steps

My genome is available here, but I have a slightly revised version of it. I run RepeatModeler with the following command:

${RMOD}/BuildDatabase -name poeMon1 poeMon1.fa
${RMOD}/RepeatModeler -LTRStruct -threads 4 -database poeMon1 2>&1 | tee 00_repeatmodeler.log

Log output

I attached here the log files for the RepeatModeler run and for the first round: rmod.log repeatscout.log makeblastdb.log filter-stage-1.log

Note that the repeatscout log is empty.

Environment:

I installed RepeatModeler v.2.0.5 according to the website instructions, with the following dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6. I am running RepeatModeler using cluster environment. I also downloaded the Dfam database and copy them inside the famdb folder:

wget https://www.dfam.org/releases/Dfam_3.8/families/FamDB/dfam38_full.0.h5.gz
gunzip dfam38_full.0.h5.gz

Any help to solve this would be appreciated! Thanks!

Best, Rifa

rmhubley commented 1 month ago

Could you provide a link to the revised version of the assembly you are using? If not, then it would help to see an example of one of the families that Refiner could not process. E.g: /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023Pmon_pop_gen/0repeatmasking/RM_25614.FriJul121538512024/round-1/family-73.fa

Whitney110 commented 1 month ago

Describe the issue

I'm having the same problem as Athena Syarifa. This is the version of the assembly I'm using:

RepeatModeler Version 2.0.5
===========================
Using output directory = /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024
Search Engine = rmblast 2.14.1+
Threads = 32
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.5, RepeatMasker 4.1.2
LTR Structural Analysis: Enabled ( GenomeTools 1.6.2, LTR_Retriever v2.9.9,
                                   Ninja 0.95-cluster_only, MAFFT 7.525,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1720350706

A warning sign like the following:(the next few rounds have the same warning.)

RepeatModeler Round # 2
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 10000000 bp
   - Sequence extraction : 00:00:05 (hh:mm:ss) Elapsed Time
 -- Running TRFMask on the sequence...
   - TRFMask time 00:00:00 (hh:mm:ss) Elapsed Time
 -- Sample Stats:
       Sample Size 10002018 bp
       Num Contigs Represented = 24
       Non ambiguous bp:
             Initial: 10002018 bp
             After Masking: 10002018 bp
             Masked: 0.00 % 
 -- Input Database Coverage: 10002018 bp out of 894375991 bp ( 1.12 % )
Sampling Time: 00:00:06 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...
  - Total Comparisons = 31125
        0% completed,  00:12:24 (hh:mm:ss) est. time remaining.
        1% completed,  00:10:16 (hh:mm:ss) est. time remaining.
        2% completed,  00:8:10 (hh:mm:ss) est. time remaining.
        3% completed,  00:7:36 (hh:mm:ss) est. time remaining.
       ...
       99% completed,  00:0:00 (hh:mm:ss) est. time remaining.
       99% completed,  00:0:00 (hh:mm:ss) est. time remaining.
      100% completed,  00:0:00 (hh:mm:ss) est. time remaining.
Comparison Time: 00:08:16 (hh:mm:ss) Elapsed Time, 1088722 HSPs Collected
  - RECON: Running imagespread..
RECON Elapsed: 00:00:03 (hh:mm:ss) Elapsed Time
  - RECON: Running initial definition of elements ( eledef )..
RECON Elapsed: 00:00:42 (hh:mm:ss) Elapsed Time
  - RECON: Running re-definition of elements ( eleredef )..
RECON Elapsed: 00:26:13 (hh:mm:ss) Elapsed Time
  - RECON: Running re-definition of edges ( edgeredef )..
RECON Elapsed: 00:03:47 (hh:mm:ss) Elapsed Time
  - RECON: Running family definition ( famdef )..
RECON Elapsed: 00:00:09 (hh:mm:ss) Elapsed Time
  - Obtaining element sequences
Number of families returned by RECON: 1855
Processing families with greater than 15 elements
Instance Gathering: 00:00:01 (hh:mm:ss) Elapsed Time
Refining 26 families
WARNING: Retrying job ( 0 ) [ 255 ]...
WARNING: Retrying job ( 1 ) [ 255 ]...
WARNING: Retrying job ( 2 ) [ 255 ]...
WARNING: Retrying job ( 3 ) [ 255 ]...
WARNING: Retrying job ( 5 ) [ 255 ]...
WARNING: Retrying job ( 7 ) [ 255 ]...
WARNING: Retrying job ( 4 ) [ 255 ]...
WARNING: Retrying job ( 6 ) [ 255 ]...
WARNING: Retrying job ( 8 ) [ 255 ]...
WARNING: Retrying job ( 9 ) [ 255 ]...
WARNING: Retrying job ( 10 ) [ 255 ]...
WARNING: Retrying job ( 11 ) [ 255 ]...
WARNING: Retrying job ( 12 ) [ 255 ]...
WARNING: Retrying job ( 13 ) [ 255 ]...
WARNING: Retrying job ( 14 ) [ 255 ]...
WARNING: Retrying job ( 15 ) [ 255 ]...
WARNING: Retrying job ( 16 ) [ 255 ]...
WARNING: Retrying job ( 17 ) [ 255 ]...
WARNING: Retrying job ( 18 ) [ 255 ]...
WARNING: Retrying job ( 19 ) [ 255 ]...
WARNING: Retrying job ( 20 ) [ 255 ]...
WARNING: Retrying job ( 21 ) [ 255 ]...
WARNING: Retrying job ( 22 ) [ 255 ]...
WARNING: Retrying job ( 23 ) [ 255 ]...
WARNING: Retrying job ( 24 ) [ 255 ]...
WARNING: Retrying job ( 25 ) [ 255 ]...
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-40.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-136.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-20.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-256.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-127.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-264.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-25.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-9.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-429.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-197.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-934.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-39.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-71.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-118.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-172.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-75.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-76.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-488.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-22.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-690.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-16.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-102.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-114.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-51.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-11.fa.
  WARNING: Refiner did not return a consensus for /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-828.fa.
Family Refinement: 00:00:01 (hh:mm:ss) Elapsed Time
Round Time: 00:39:35 (hh:mm:ss) Elapsed Time : 0 families discovered.

I run RepeatModeler with the following command:

BuildDatabase -name Hhn_Chr2_DB -engine ncbi Hhn_Chr_v2.fasta
RepeatModeler -threads 32 -database Hhn_Chr2_DB -engine ncbi -LTRStruct

Have you found a solution now? Thanks!

Best, Whitney

athenasyarifa commented 1 month ago

Hi @rmhubley thanks for getting back to me!

I might be able to send you the link to the genome via email since it's not public yet. Can you give me your email? Otherwise, I attached here the files of some of the families refiner couldn't process. families.tar.gz

Best, Rifa

rmhubley commented 1 month ago

@athenasyarifa, I tried running a few of those families through my copy of Refiner and they work fine. I suspect this is a configuration issue related to both installations of RepeatModeler.

Would you both try running Refiner by hand to see if you get an error message:

for @athenasyarifa

% <path_to_repeatmodeler>/Refiner /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-73.fa

and @Whitney110

% <path_to_repeatmodeler>/Refiner /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-40.fa
Whitney110 commented 1 month ago

Hi @rmhubley,Thanks for your reply! I ran Refiner by hand as suggested by the command you provided,and it is seems to work fine like following:

[luguohui03@login RepeatModeler]$ Refiner /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-40.fa
  - numRounds = 8
  - Consensus Length = 6804 ( orig = 6845 )
  - Avg Kimura Divergence = 0.00
  - Unaligned sequences = 0 ( orig = 0 )
  Build Consensus: 0:0:21 Elapsed Time

What configuration issue do you suspect is? Thanks!

Best, Whitney

athenasyarifa commented 1 month ago

Hi both @rmhubley and Whitney

I have the following error message when I tried running Refiner by hand according to your instructions:

NCBIBlastSearchEngine::search: Error...compressed subject database (/dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-73.fa) does not exist!
 at /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/tools/RepeatModeler-2.0.5/Refiner line 422.

Is it a database or makeblastdb problem then? But I didn't find anything wrong in the makeblastdb.log. Let me know what you think!

Thanks! Rifa

rmhubley commented 1 month ago

@Whitney110 - That is the expected output of Refiner. Could you now check that you have a new file: /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-40.fa.refiner_cons and that it constains a single FASTA sequence?

@athenasyarifa - Hmm. Could you check that the original file exists and is not empty?: /data3/home/luguohui03/hhp/Repeat/Hhn_repeat/RM_65159.SunJul71911502024/round-2/family-40.fa

athenasyarifa commented 1 month ago

Hi @rmhubley Yes the file is not empty, see below:

cat /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_25614.FriJul121538512024/round-1/family-73.fa
>gi|10306 gi|44:5974102-5974650
GGATCTTTGCCTTTTCTGCTTTCCATGGGAGGTGGGAGGGAAGTAAGTGGCAGCACGATTTCAGCAGGAGCAGGAAATTGGGGAATGCCATTCCTGAAGCCCAGCCCATGGAAACCGAGCATCCCAGCTGGTGCCAGGCCTGGTTGCCATGGCAGCAGCCTTGGGAGCGGGTCCCTGCCTGGGGGCTGTGGGAACCTCTTCCCTCTGGTGCCCAGGGACAGGAGTGGAGGGAGCGGCTGGAGCTGAGGCGGGCAGGTTTAGGCTGGATGTGAGGAAAAGGTTTTTCCCGCGAGGCTGCTGGGGCACTGAACAGGCTCCCCAGGGAAGGCTCCCAGCTCCAGGGCTGGCTGAGCTCCAGCAGCGTTTGGCCAGCGCTGCCAGGCCCAGGCTGGGATTGTTGGGGTGTCCTGTGCAGGGCCAGCAGTTGGACTGGAGGATCCCCATGGCTCCCTCCCAACTCAGCCAATTCTGTGGCTCTGTGATCCCATGAGCCTGGGGATGGGACTGGGAATGGTTGCCATGGCAACGGACCCTGGCTCCAGGCCTGAG
>gi|12125 gi|44:2044700-2044152
CTCAGGCCTGGAGCCAGAGCCCGTTGCCATGGCAACCATTTGCAGTCCCATCCCCAGGCTCATGGGATCACAGAGCCACAGAATTGGCTGAGTTGGGAGGGACCCATGGGGATCCTCCAGTCCAACTGCTGGCCCTGCACAGGACACCCCAACAATCCCAGCCTGGGCCTGGCAGCGCTGGCCAAACGCTGCTGGAGCTCAGCCAGCCCTGGAGCTGGGAGCCTTCCCTGGGGAGCCTGTTCAGTGCCCCAGCAGCCTCGCGGGAAAAACCTTTTCCTCACATCCAGCCTAAACCTGCCCGCCTCAGCTCCAGCCGCTCCCTCCACTCCTGTCCCTGGGCACCAGAGGGAAGAGGTTCCTGCAGCCCCCAGCCAGGGACCCGCTCCCAAGGCTGCTGCCATGGCAACCAGGCCTGGCACCAGCTGGGACGCTCGGTTTCCACGGGCTGGGCTTCAGGAATGGCATTCCCCAATTTCCTGCTCCGGCTGAAATTGTGCTGCCACTCGCTTCCCTCCCACCTCCCATGGAAAGCAGAAAAGGCAAAGATCC
>gi|511 gi|44:7911544-7912092
GGATCTTTGCCTTTTCTGCTTTCCATAGGAGGTGGGAGGGAAGCGAGTGGCAGCGTGGTTTCAGCAGGAGCAGGAAATTGATGAATCCCATTCCTAAAGCCCAGCCCGTGGAAACCGAGCATCCCAGCTGGTGCCAGGCCTGGTTGCCATGGCAGCAGCCTTGGGAGCGGGTCCCTGGCTGGGGGCTGTGGGAACCTCTTCCCTCTGGTGCCCAGGGACAGGAGTGGAGGGAGCGGCTGGAGCTGAGGCGGGCAGGTTTAGGCTGGATGTGAGGAAAAGGTTTTTCCCGTGAGGCTGCTGGGGCACTGAACAGGCTCCCCAGGGAAGGCTCCCAGCTCCAGGGCTGGCTGAGCTCCAGCAGCGTTTGGCCAGCGCTGCCAGGCCCAGGCTGGGATTGTTGGGGTGTCCTGTGCAGGGCCAGCAGTTGGACTGGAGGATCCCCATGGCTCCCTCCCAACTCAGCCAATTCTGTGGCTCTGTGATCCCATGAGCCTGGGGATGGGACTGGAAATGGTTGCCATGGCAACGGGCTCTGGCTCCAGGCCTGAG
>gi|9296 gi|1:332725-332178
CTCAGGCCTGAAGCCAGACCCCGTTGCCATGGCAACCATTTGCAGTTCCCATTCCCAGGTTCATGGGATCAGAGCCACAGAATTGGCTGAGTTGGGAGGGACCCATGGGGATCCTCCAGTCCAACTGCTGGCCCTGCACAGGACACCCCAACAATCCCAGCCTGGGCCTGGCAGCGCTGGCCAAACGCTGCTGGAGCTTAGACAGCCCTGGAGCTGGGAGCCTTCCCTGGGGAGCCTGTTCAGTGCCCCAGGAACCTCAGGGGAAAAACCTTTTCCTCACATCCAGCCTAAACCTGCCCGCCTCAGCTCCAGCCGCTCCCTCCACTCCTGTCCCTGGACACCAGAGGGAAGAGGTTCCCACAGCCCCCAGCCAGGGACCCTCTCCCAAGGCTGCTGCCATGGCAACGAGGCCTGGCACCAGCTGGGATGCTCGGTTTCCACAGGCTGGGCTTCAGGAATGGCATTCCCCAATTTCCTGCTCCCACTAAAATTGCACCACTGCTCTCTTCCCTCCCACCTCCCATGGAAAGCACAAAAGGCAAAGATCC
Whitney110 commented 1 month ago

@rmhubley Yes,there is a new file under this path ,that constains a single FASTA sequence.

[luguohui03@login round-2]$ less family-40.fa.refiner_cons
>family ( Final Multiple Alignment Size = 19 , Avg Kimura = 0.00372093023255814 )
ATCTTCGTTTCAAATCGCAGCTACGTTGCTGTCGGTCTTCGAGGAAGNTTGNCTTTGGCTTGGAGAGGGCCGTAGNGGCCGGATCTTCGCCGGAGAGCAAGGAGGANGGCTGGTGGAGGNTNTCGGCGCGGGATGGAGTAAGTTTTTGGCTTGGCACGATCGTGCCAAGCCTTTTNTATGCGGCTGTCGGCGACGCGACATGCCCATGCCAANTGGCACGGGCGGGGCGTGTCGGGTTCTGTCCCGCTCTACCCCCGCCCCCCTCGTTTTTTTTTCAGAGCGGAGTTTGAGCTAGTTCGCTCGTCTGGGTCGCCGATTTTTGTGTTCCTGAAATTAAAAAGTGAATGTAAGTGAAAAGAATGGAAATTGCGAAAAAGAAATTATGAAAAACTAAATCTTGGGTTGCCTCCCAAGAAGCGCTTATTTTAGGTCGTAAGCTCGACCCCTGTTATTTCTATCCTGGNTCAAAACTCGTGCTCCTTNCTCATCTNAGAGGAGTACATATGTGANGAGCTAAGGTGTGAGAAAGATTATTCTCGTCTCTTANCTTTATGAGTTCTTCTATGGAGTACCGATGCGNTTCTTTTTCTGTACTTGGTTATCCTGCAGCAGAATTCTTTGACAGATGGTGAGTTTAAAGAGTTAGAAAAATTAAATTNGATTCTCTCATCTTCNACTTCTAAAGATAATCTCTGATTTTTNACGTCAATAATGGTTCCGGTAGTAGCTAAAAAGGATCNTCCTAGGATAATAGGGTTTTCAGNATTTTCTCCCATATCCAGGACAAGAAAATCAGTGGGTATTACAAANTTTTCAATTGCTACCGGGATGTCTTCTAATAGGCCTAAGGGTTGCCTTAGTGATTCATCNGCGAGTCTTAAGGTTACGTCCGTGGGCTTTAATCCACTCGGTCTCATATATTTCATAAAATGAAGATGGTANGAGATTTACGCTAGACCCTAAGTCGCATAGAGCTATATCGAAATCAAANTCATAGATTGTACAGGGAATTGAAAAGTCACCTGGGTCTTCAAGTTTAAAAGCATATTTGTCTTGAATTACCANGTTNCTTTCTCCTTCTATAGTGATGGCTGCACAATCCTTGANTTCCTTTTTNTGCTCCTCAAGTCGCTCATCAACCTCTGCTTCGACCATNCTAAAGAAGGTTGTGGGGGCTTCAGGTTTACTTCCACTGCGATGTGCTGTTATAGGTACGGCGTTGCATCGTTCCACNGGATTCGTTTCGGCTTCTCTAGGGAATCTCCCTGGTGTTCTTGAGGATGAACTAGCTATTTGCGCAACTTGAGTTTCCAACATTTTAGCATGGTTAGCTAAGTTATCGATCTTACTTACTAGTTGTTGGAGTGCTTCCGTCGTTTCTTTATGTTGTCGGTTTGATTCCCTCATATGTTGGCGGAACTCATGGTTATTTTGAGCAGTCTGNGATTCAATGAAATTTTCCATCAATAACTTTCTAATTTTGGCTATTTGATCGTGTTGAGAACCTNAAGATTGTTCCGGAGTGTTTTGNTTAACTTGGTTCTGATTGTTCCGGTACGAAAANTTTGGATGGTTCCTCCAGCTTGGATTGTAGGTGTTGGAGTAAGGATTGTTCATAGGTCGCTGGTTGAAATTGTTGANCGCGTCGACTTGTTCGACGGATGTTCCCGAAGATGGGGTAAGTCCTATTTGACAATTTTGAGTTGAGTGACCTGAAACGCCACATATCTCACAAGAAATATTAGTAGAATTGATAGCATTNACATTAAAGTGTTCAAATTTGTGAGATAATGCATCTAATTTTGCAGCGAGTAGATCTATAGCGCTTACCTCATACTTACCTGGGGTGCGAGTAGAGTCCCTTCTTCCGTTGCCCACTGGTGGTGGTTGAGTGCGACACTTTCGATAATATCGAANGCTTCGTCCAACGTTTTGTTCATNAGGGCTCCACCTGCAGCTGAGTCTAGAGAAACTTTNGTAGGATAGGAGATACCATTATAGAAGGTGTGGATGATNAGCCATCTCTCNAATCCGTGATGNGGGCATTGCTGCAATAACCCTTTAAATCTATCCCAAGCTTCAAAAAGTGATTCTCCATCTTTCTGCGTGAAGTTCGTGATTTGATGGCGGAGATTTGCGGTCTTGCTCGGGGGAAAATATTTGTCCAGAAATTTTTGTTCTAGTTGATCCCACGTTGTGATGCAATCCGCGGGAAGAGAATGCAACCATGCTCTTGCTTTGTCCCGTAGGGAGAAAGGGAATAATCGTAGTCTAATCGCATCTGCAGAGGCTCCATTAAACTTTAAAGTTTCACAAAATTCTAAAAATACCTCTAGATGTAGATTAGGATCCTCTAAAGNTNTCCCTCCGAATTGGTGTTGCTGAATCATGGATAGTAGTGCGGGTTTCANTTCGAAATTGTTTGCTTCGATTGGAGGTCTTCGNATGCTCGACCTTAGCCCCCTTGTACTTGGTGTTGCATAGTCNTTCAAAGGTTTGTTTTCCTCTTGTGNTGCCATAGCGGAAGATTGTGCTTCCTTAAGTCGTTTTTGGAGTCTTCGTCTAGCAAGGAGAGTTCTTTCAATTTCAGGATCTAGCTCTGTTAATTCTCCCTGTCTAGTTGATCTATGCATAAAACGAGAAGNGCAACTCCAANGGAAGTGANTATAGTAATGATAATGAAAGTATAGAAGATAAAATNAAAGTAAAGAAAAGAAGATAAATNAAGATCTAGATTAACCTAATTGCCAAATAAACTGATATTGACGCAGTCCCCGGCAACGACGCCAAAAACTTGATGTGATGTCGCAAACCGCAAGCGCACGGTCGTCGTCAAGTAATAAAAATATCGATCCCACAGAGACTNTGTCAAGTACCGGATGNTTGCAAGGAAAGATTATCTAGAAGAATTAATTGGTTGAGTGGTGATTGAATAAGGAAGACAGACGATCGAGAAACGATTATAGGACTAGTTTANGAAACAATCTTAGGACTTCGGTTTCGCTACNATGTCTAATGTCTNGGCAGNTTCTCNAATATCGATATGTATTCNTGAAGGAAACCTATCTAAGGTAATCGTCAGCCTCTCTCGAGATCTAACGATATATTTACCTAACNCGGCTCCTACTTTCACGGTATTCGCACTTAGAATGACCTTGATCGCAATATGGGAACCTGTCACGAGAACCCCAGCGGATCAANTCAAGACCTAATACCTATTACCTATCATAAGACCCAAGATTAGATAAATTATTATGTTCTAGGATAGGTTATGAAATCCTCGAGAGGTCCTAATCACTTTCGGGGCGTTCGTGTATAGATGACCTTGNTCGCAATACGGGAACTTGTCACGAGAACCCCGGCGGATCATCTAGACCTTAATCGTTACTTGTAGTTAAATTCTCACGATTAGTTCTACGAATCCGGGATCATAGAAAACCCCAAAAGCATGCGAATAGGTGATCAACAAACTCGCAAGATAATCCATANGCGCCGTAAAACCAATTCAAGAAGTTCACAATCGATAAATAAAAGAGTTGTAACAATCCAAGACAAATAGACANAGAGCTACTCCCTAATCCTAAGATCGAGGAGAATTACTCCATAGTACGGAGGGAAGATNGCAATGAATTACAGAAAGACATTCGACAAATCTAAAAGAGAANAAGAGNAAAAACTAGTTTTGTCGACGTCGTNGAAGACTTCCTCCAGCTTGGAGCTTCTTCTCCACGGCTCCGAGACTTGCCCTAGGATGCTCCTTGATGNCTCCCTCCTAATCCCTAGGATCTCCCGAATGCCTTTCTCCTTAGCCCTAGGGCCTTTTTATAGACTTTTCTGGGAGAGAAAATCGGGAAATTAGTTGCGAGAAATTCGGAATAGCCTTGAGAAATCCACGACTTGGTCTATCCGTGGCATGGCACGACCGGCGTGCTAGGCGTTGTGGAGACCTGGCACGCCCCGAANTCGGGCGTGTTGGAGGAGCTGGGCACGAGCGTGCCGTTGGCACGNCCTGGGCATGTTGGCCTCTGGTTCGCTCCTTTTCCGTGCGTTCCTGCAAGAAAAACTCACAACCGCAGTTATCTGNACTAAAAAGGCAANATGAANGNAAATGCTATGTTTTATGCATGATATGTGAACAAAATACTAGAAAAACATGGTAAATAAACCCTAAAATATGCGTATGATTCACACTCATCACACCCCCAAACTTAAACTTTTGCTTGTCCTCAAGCAAAGAAAAGAAAGAAACTATGAGTGAGAATCATGATAACTTGTTAGAAACCNATCCTAAGGTATACTTAGTCTTATCGATTCGTTCTCGATGGCTTAGCTTGAAAGACCCTTGTAGTGCATGTGAATTTTGACCTTAAAGTTTTGGACATCAATCCGAGTCAAATATCCCTTCTACCGCGTTCCTCGCNTNTCCGGCGTAAAGACACTTACCTGCCCTAATATGCTTGGATCATTCTCTCACTACCATACTTAGTCTCAAAGGAGCACTCATTCAAGGAGAAAGAAAAGTAGGAAATAAACATCGAGTATTTTCCCCAGTAACTAGTTAGTCTCAAAGGGGCACCCTTATGGGTGTTTCCCCAGTAGCTACTTGGTCTCAAAGGGGTACTTTAATGATTTTCATCCTTTCTTTTATTAAATATTTTTTTTTCTGAAATCTTTCTTTCTTTCCCCTTTTNTTTTTTTTTCTTTTTNTTCTTTTTTTTTTTAAACTCCTGAGATGAGTGTTTCTCTAGTAACTATAGTATTATGAAAGTCCCCTATGGAAATTTGAGTACTAGTCCGGTATGGGGACGAACTCTTTTTGGATACCTCTACCCTATAGGTTGTGTAAAAAGTGAAACAAGGTTCTTATCAAGTAGGCCAAAAGGATCTTTACCGATCTAACTAAGTTCAGTTATATTACTTTGGATAGAGAGATAACAAGTTATCATATATTACTAGTNTTTATGCGAGGTAATATGCGNAGAAAATANGCTAACAATGCGAGGAAATATGCGNAGGAAATACGCTAATAATCGAGGAAATATGCGNAGAAAATATGCTAACAATGCAAGGAAATATGCGAAGAAAATATGCAAAATTATGCNAAATAGGAAAAATAAACTAAGCTACCCCCCCTCAAACTTAAAATATGCATTGTCCTCAATGTATAGAAGGGTAATAATAAAGTAAACTACTCAGGAGGAGGAGGTGGATAGAAATGACCTTGCGAAAGGTGCCAATGATACATGGCCTCGATCATCTNCCTCTGAGTGCTAAGTTCGGCCCGCTGCTCCCTTAGGTCTATGTGNANNGNATCCAGGGAGGACTGAATCCNAGCGTATGNATCATCAGGAGTGCTGGTACCTGCANAAGAAGAGCCGAAAGNAAAAGAATGCCTAGGAGGGTCGGCCCTATGGNGNGGTACTGTANNGGACTCTANGGGGTCAGGTACTGTAGTGGACTCTAAGGGATCAGGCATAGGAACCACATCAGTCATGAGNCNGTTGGCTCGGTTCCTTACTGTGGTGNGGTCAGGGTTTGGAAGTGGNAACGGAAAATGATCCTTTAAAACGAAACAATANCTTCCTTCATCTCTCGTGATCATCTTCATAGCCATGCAAGCATCGAGATCAATCTTGGGGCTANTTTGAATCGACTCTAATACGTCAAGATTATACCCCAAGGCCATTGCTATTCGTGTTATTAGACCTCCAACTAAGATTGATCCTGTCGANCTTTTNCCTATTTTTACTAAGTGTCTAATAAAATGTGANCTAGTATCGATATTTATATTATTAANCATGGCCCATAAAATAATTAATTCACTTTGCCTAACTACACCTTCGCTATCTCCTCGACCAAAAATCGTGTTAGCCATNACCCTATGGAGATATCTAAAAATTAGGTTAGGGAGATTTGAAGATTTAGACCTAGAAGGGTCATAGTTTTTGTACCCTAGCAATTTCTCCCCAAAATACAAAATCTTGAAGTCCCCTAGGTATTCTCCTTTCTCCTCCGCATGGTAAATCGTAAACAGAATTAAATTCTTCGAGTGTCCATGAATATTCATTATTAAACATTCTAAAAGTAATCTTACCTATNTCACAGTTTTTTCCGGATANGATCTCGGCTTCACAGAACTTAGGAATTCTAGGGTTATTTTTGGGTACGTAGGGTAATTTATATTCATAAGTCCAGTCCATCCNACCCTATTAATCATCCAATTAATATCATCCCTAATTCCTAAGGTTTCGATGGTGTAGTTATCAATATATCTAGTGCTAATCATTTTTCTTTTCGACCAAGTTATCAAATCTAGCCCTATGGTTANCATCGCGAAATATAATACCATAGANATTNTCGTTACCCTCGTCACGNGCCCTACGCCTNGGATTAGAAGCGGCGGCGGAAGCGGCGGTGGTATTCTTCCTTAATCTCTTGAACATTTTGATGTTTGAACTCTGATTTTCGCTTGGGTAGGTTGGAAGTANGGAGANNTCTCACCNAAGGGTGATCTCTGCAANTTCTNGAAAATTCGAATCTCTTGAACATTTTATGGAAGAAACTTGAAGCTANAACTTTTCGGAATGGCGTTCTCGAACTCGGAATTTGCTTGGAGAGCCTGGATCACCTTGGAGAAGGCTCGATCTCGAGTTGGAAGAAGAGATTCGGCCGGAAATTGGAAGGAAGGGCTGGCTGAAACAAGATCGAGCATGAAATCTCTGGATCTTNGNCTCAATGCGCTAGGTTGCCGTCGGTCTTCGAGGAAGTTACTTTGGCTCGGAGAGGGCCGTAGTGGCTGGATCT
athenasyarifa commented 1 month ago

Hi @rmhubley Yesterday, I realized that I have not given the appropriate permission (chmod +x) for Refiner or other tools inside the RepeatModeler folder. I might have run the command you told me in a different folder so that might be why it didn't work.

However, now that I run chmod +x for Refiner and other tools inside RepeatModeler, and run RepeatModeler again, it seems to be working! This is how the log file looks so far:

RepeatModeler Version 2.0.5
===========================
Using output directory = /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/RM_43518.TueJul231505232024
Search Engine = rmblast 2.14.1+
Threads = 4
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6
LTR Structural Analysis: Enabled ( GenomeTools /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/poeMon1, LTR_Retriever v2.9.0,
                                   Ninja 0.95-cluster_only, MAFFT 7.453,
                                   CD-HIT 4.8.1 )
Random Number Seed: 1721739911
Database = /dss/dsslegfs01/pr53da/pr53da-dss-0026/projects/2023__Pmon_pop_gen/0__repeatmasking/poeMon1   - Sequences = 191
  - Bases = 1223830793
  - N50 = 79541754
  - Contig Histogram:
  Size(bp)                                                        Count
  -----------------------------------------------------------------------
  145699346-156106300 |                                                   [ 1 ]
  135292393-145699346 |                                                   [  ]
  124885440-135292393 |                                                   [  ]
  114478486-124885439 |                                                   [ 2 ]
  104071533-114478486 |                                                   [  ]
  93664580-104071533  |                                                   [  ]
  83257626-93664579   |                                                   [  ]
  72850673-83257626   |                                                   [ 3 ]
  62443720-72850673   |                                                   [ 1 ]
  52036766-62443719   |                                                   [  ]
  41629813-52036766   |                                                   [  ]
  31222860-41629813   |                                                   [ 3 ]
  20815906-31222859   |*                                                  [ 6 ]
  10408953-20815906   |**                                                 [ 8 ]
  2000-10408953       |************************************************** [ 167 ]

Storage Throughput = good ( 896.76 MB/s )

RepeatModeler Round # 1
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 40000000 bp
   - Final Sample Size = 40021120 bp ( 40007946 non ambiguous )
   - Num Contigs Represented = 69
   - Sequence extraction : 00:01:09 (hh:mm:ss) Elapsed Time
 -- Running RepeatScout on the sequences...
   - RepeatScout: 00:10:14 (hh:mm:ss) Elapsed Time
Round Time: 01:15:37 (hh:mm:ss) Elapsed Time : 174 families discovered.

RepeatModeler Round # 2
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 10000000 bp
   - Sequence extraction : 00:00:19 (hh:mm:ss) Elapsed Time
 -- Running TRFMask on the sequence...
   - TRFMask time 00:00:00 (hh:mm:ss) Elapsed Time
 -- Masking repeats from the previous rounds...
       5328 repeats masked totaling 1111212 bp(s).
   - TE Masking time 00:00:49 (hh:mm:ss) Elapsed Time
 -- Sample Stats:
       Sample Size 10005342 bp
       Num Contigs Represented = 45
       Non ambiguous bp:
             Initial: 10002342 bp
             After Masking: 8891146 bp
             Masked: 11.11 % 
 -- Input Database Coverage: 10005342 bp out of 1223830793 bp ( 0.82 % )
Sampling Time: 00:01:17 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...
  - Total Comparisons = 31125

I am so sorry about this 😆, such a simple mistake, I imagine you took the time to help me for nothing 😢

@Whitney110 Could this also be the solution for you? Good luck!

Best, Rifa

Whitney110 commented 1 month ago

Hi @athenasyarifa,Thank you for your reminding, I checked my RepeatModeler folder also has an issue with execution permissions, I run chmod +x for all the tools in the RepeatModeler and now it works fine.

@rmhubley @athenasyarifa Thank you again!

Best, Whitney

athenasyarifa commented 1 month ago

@Whitney110 Glad to know it's working now! I will close this issue.