NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

[New pipeline] DeNovoRepeatLib #32

Open Juke34 opened 4 years ago

Juke34 commented 4 years ago

See #17 for the general picture.

Maybe can be merge with the DeNovoRepeatLib pipeline (see #33).

The purpose of DeNovoRepeatLib is to make de-novo repeat library of a genome. There is two approach, should we only use the standard one? Should we use both solutions in parallel? We can provide an option to choose.

solution 1 (standard): Input: A genome fasta file + an existing lib e.g dfam or RepBase to classify the de novo repeat (give family name), A protein database(swissprot eukaryote/prokaryote) for remove potential proteins from repeats. Output: A repeat library fasta file

For detailed approach see the wiki of the annotation cluster repo here and a more condense description in this post on Biostars.

TransposonPSI is now in bioconda. protexcluder is available in the nanjiang conda channel, it should be moved into bioconda. Be careful to Blast version (protexcluder needs particular ones).

solution 2 : Use EDTA available in conda and consequently as biocontainer.