VEuPathDB / repeat-masker-nextflow

nextflow workflow to modernize current repeatMasker workflow steps
Apache License 2.0
1 stars 1 forks source link

Issues arising from testing on "top level genomic" sequences #7

Closed bobular closed 2 years ago

bobular commented 2 years ago

I used this file as input: yew:/eupath/data/EuPathDB/workflows/PiroplasmaDB/49/data/cfelWinnie/makeAndMaskTopLevelGenome/topLevelGenomicSeqs.fasta

The old workflow params (from input/task.prop) were

trimDangling=n
dangleMax=0
rmParamsFile=rmParams

And the contents of rmParams is

-xsmall -species 'Cytauxzoon felis' -dir .

I used the following nextflow.config

params {
  inputFilePath = "$baseDir/test-data/topLevelGenomicSeqs.fasta"
  fastaSubsetSize = 5
  trimDangling = false
  dangleMax = 0
  outputFileName = "topLevelGenomicSeqs.masked.fa"
  outputDir = "$baseDir/test-output"
}
process {
  container = 'veupathdb/repeatmasker:latest'
}
docker {
    enabled = true
}

I believe the desired output should be yew:/eupath/data/EuPathDB/workflows/PiroplasmaDB/49/data/cfelWinnie/makeAndMaskTopLevelGenome/master/mainresult/blocked.seq

The main differences are

CF_contig00210[TOO SHORT: length(TAATTAGCCTTG)]
CF_contig00281[TOO SHORT: length(AAAGACC)]
CF_contig00305[TOO SHORT: length()]
CF_contig00267[TOO SHORT: length(TTTTCC)]
CF_contig00331[TOO SHORT: length(TCTCTCCCAAA)]
rdemko2332 commented 2 years ago

Resolved with new commits.