Open gwct opened 3 years ago
I am having the same issue with a phyllotis data set. Now that it is on the larger scaffolds it is processing them sequentially.
Thanks for pointing this out. It's odd how this only seems to affect some datasets. I think I'm familiar with the cluster you're running on and I can see that you're on compute-0-24
. Have you tried a node with more memory?
However, as per #6, this will likely not be fixed in this version of pseudo-it since I'm aiming to re-implement the whole pipeline with snakemake. Hopefully that will be mostly done (or at least in a usable form) by the end of August. Happy to chat elsewhere about this too since you likely don't want to wait that long.
On some datasets (Peromyscus), processes that are spawned to call variants with GATK sometimes get locked or do not spawn a new GATK process, resulting in only 1 process being used to call variants. This is obviously a major problem since one of the big advantages of pseudo-it is that it allows users to call variants in parallel and drastically speed-up run-times.
Some notes about this issue:
Things to try:
starmap
, but I've had memory issues with this function since it stores results until all processes are complete. There really isn't anything stored for each process here though. Should tryimap_unordered
anyways.map
andimap
is 1, but againstarmap
might be different.Screenshot of htop on the Peromyscus data: