VEuPathDB / repeat-masker-nextflow

nextflow workflow to modernize current repeatMasker workflow steps
Apache License 2.0
1 stars 1 forks source link

THIS REPO IS 🚧 UNDER CONSTRUCTION 🚧 and NOT Used in ANY production CODE

Nextflow Conversion of repeatMaskerTask.pm

RepeatMasker

flowchart TD
    p0((Channel.fromPath))
    p1([splitFasta])
    p2[repeatMasker:runRepeatMasker]
    p3[repeatMasker:cleanSequences]
    p4([collectFile])
    p5(( ))
    p6([collectFile])
    p7(( ))
    p0 --> p1
    p1 -->|seqs| p2
    p2 --> p3
    p3 --> p4
    p3 --> p6
    p4 --> p5
    p6 --> p7

Get Started

Description of nextflow configuration parameters:

param value type description
inputFilePath string Path to the input fasta file.
trimDangling boolean Would you like to remove sections of masked repeats or not?
dangleMax integer Number of nucleotides required between sections of repeats to stop removal process.
outputFileName string How you would like the output file named.
outputDir string Where you would the the output file to be stored.
rmParams string Additional arguments to be passed to RepeatMasker
errorFileName string How you would like the error file named
libraryPath string rmParams "-species" will not work due to dfam files not containing full library. These would be to large to include in container. Instead, I will be adding a parameter, libraryPath. This will be used to set an environment variable LIBDIR, which repeatMasker will use. If you are running this locally, or only want to use the curated versions of the databases, you can just leave it as /opt/RepeatMasker/Libraries. If you want to data that does not come standard, please supply the path to the RepeatMasker libraries.