Maybe can be merge with the DeNovoRepeatLib pipeline (see #33).
The purpose of DeNovoRepeatLib is to make de-novo repeat library of a genome.
There is two approach, should we only use the standard one? Should we use both solutions in parallel? We can provide an option to choose.
solution 1 (standard):
Input: A genome fasta file + an existing lib e.g dfam or RepBase to classify the de novo repeat (give family name), A protein database(swissprot eukaryote/prokaryote) for remove potential proteins from repeats.
Output: A repeat library fasta file
For detailed approach see the wiki of the annotation cluster repo here and a more condense description in this post on Biostars.
TransposonPSI is now in bioconda.
protexcluder is available in the nanjiang conda channel, it should be moved into bioconda.
Be careful to Blast version (protexcluder needs particular ones).
solution 2 : Use EDTA available in conda and consequently as biocontainer.
See #17 for the general picture.
Maybe can be merge with the DeNovoRepeatLib pipeline (see #33).
The purpose of DeNovoRepeatLib is to make de-novo repeat library of a genome. There is two approach, should we only use the standard one? Should we use both solutions in parallel? We can provide an option to choose.
solution 1 (standard): Input: A genome fasta file + an existing lib e.g dfam or RepBase to classify the de novo repeat (give family name), A protein database(swissprot eukaryote/prokaryote) for remove potential proteins from repeats. Output: A repeat library fasta file
For detailed approach see the wiki of the annotation cluster repo here and a more condense description in this post on Biostars.
TransposonPSI is now in bioconda. protexcluder is available in the nanjiang conda channel, it should be moved into bioconda. Be careful to Blast version (protexcluder needs particular ones).
solution 2 : Use EDTA available in conda and consequently as biocontainer.