Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
230 stars 50 forks source link

Missing outputs #142

Closed OliverPStuart closed 2 years ago

OliverPStuart commented 2 years ago

Hi there,

I've run RepeatMasker on a large .fasta file with a custom input library (also a .fasta) and most of the outputs are missing. This is the code I ran:

perl ~/RepeatMasker/RepeatMasker -lib D_australis-families_filtered.fa ${REF_DIR}/LHISI_Scaffold_Assembly.fasta

The only output I have is LHISI_Scaffold_Assembly.fasta.cat.gz in ${REF_DIR}. There are no log files or other indications that RepeatMasker ran. Where should I expect them to be written to?

Is it possible to recreate the other output files with the .cat.gz file? The entire run took roughly three days (~3.6 Gb input) and I would prefer not to do it again.

OliverPStuart commented 2 years ago

I reran the code on a smaller dataset and the problem occurs when ./RepeatMasker/ProcessRepeats is called. No other information is given and the output directory is removed at this point for some reason.

jebrosen commented 2 years ago

Hi, you could try running ProcessRepeats manually. Most of the expected program output files, aside from the .cat.gz file, are generated by ProcessRepeats. For example, ~/RepeatMasker/ProcessRepeats -lib D_australis-families_filtered.fa assembly.fasta.cat.gz. Hopefully this either works or helps explain what went wrong.

OliverPStuart commented 2 years ago

That seems to work but doesn't explain what went wrong. In any case, I've switched to the conda installation and that works well out of the box. Thanks for your help.