BigelowLab / dadautils

Provides utility functions for working with [dada2](https://benjjneb.github.io/dada2/index.html) package and the eDNA workflow.
MIT License
0 stars 0 forks source link

Multithread strangeness #14

Open robinsleith opened 3 years ago

robinsleith commented 3 years ago

I was checking verbose fix to be sure everything was working and I noticed large time discrepancies between jobs. I am submitting as batch jobs, not sure if other processes on Charlie could impact these times but it seems strange to have such big differences when only changing truncLen and verbose settings. These are times reported to the log between "filter and trim of input files" and "learn errors": truncLen=manual; verbose=yes : 7 minutes truncLen=manual; verbose=no : 21 minutes truncLen="auto"; verbose=yes: 18 minutes truncLen="auto"; verbose=no: 47 seconds repeat - truncLen="auto"; verbose=no: 21 minutes

The fact that repeating the same setting took either 47 seconds or 21 minutes is very strange but at least points towards it being a Charlie problem not a dadautils problem...

I tried decreasing multithread to 1 and this seems to have solved the issue, all setting take ~40 seconds with multithread=1. Just posting this now as a potential issue...

btupper commented 3 years ago

I hope that verbose is a red herring.

I hate timings on charlie when PBS is managing the environment. SIgh.

robinsleith commented 3 years ago

I think the multithread problem is isolated to filter_and_trim. I tried to "hardcode" the filter_and_trim function to only use one thread so that the rest of the pipeline (particularly taxonomony assignment) can use multiple threads. I changed this line to multithread=1 but that didnt solve the issue.