Open ykkim0127 opened 1 month ago
Also, when I run repeatproteinmask, it prints out even libraries exist in the directory as above.
Identifying Simple and Low Complexity Repeats...(masking turned off)
- Tandem Repeats: 718131 Masking Repeat Proteins... NCBIBlastXSearchEngine::search: Error...compressed subject database (Libraries//RepeatPeps.lib) does not exist! at /mambaforge/bin/RepeatProteinMask line 371.
These are separate issues:
The generation of "general libraries" by RepeatMasker is a standard step. This library contains artifact sequences used by the contamination checks. It is generated by default and doesn't have anything to do with the search for TE sequences. In your case you are using a custom library and the program confirms that it's being used here:
Using Custom Repeat Library: ps-families.fa
I also do not recommend using the "-nolow" option unless you have a specific reason not to do so. This will increase your false positives.
The message reported by RepeatProteinMask is definitely a bug, but there is a simple workaround. Simply supply "-engine ncbi" as an additional parameter and the program will correctly identify the databases in Libraries.
Thanks for quick feedback ! I understood the why general directory was created. There is one more question about the repeatproteinmasking. To resolve a double installation, I removed conda-installed repeatmasker, and ran with local-installed repeatmasker. However, when I rerun the tool using the same input as before and add -engine option, the output indicates "0" tandem repeats, which seems incorrect.
Here is the log file:
Identifying Simple and Low Complexity Repeats...(masking turned off)
- Tandem Repeats: 0 Masking Repeat Proteins...
And the commnad I used is:
./RepeatProteinMask -engine ncbi -trf_prgm ../TRF-4.09.1/build/src/trf -pvalue 0.01 -noLowSimple assembly.masked -libdir ./Libraries > proteinmasker.log 2>&1
I'm not sure why the tool is failing to detect tandem repeats.
It's because you again used an option to disable simple repeat detection ("-noLowSimple").
Hi. I installed RepeatMasker version 4.1.5 and completed configure with Dfam libraries. However, when I run repeatmasker with custom db from RepeatModeler, it creates "general libraries" in the Libraries directory. Is this mean repeatmasker does not consider Dfam libraries and only find repeat in the general libraries ? Or is it normal case ?
RepeatMasker -pa 30 -nolow -norna -no_is -gff -dir masked -lib ps-families.fa assembly > repeatmasker.log 2>&1
the log file of repeatmasker:
and I have those files in the libaries: