brentp / peddy

genotype :: ped correspondence check, ancestry check, sex check. directly, quickly on VCF
MIT License
129 stars 39 forks source link

Parallel processing blocked #86

Open MattWellie opened 2 years ago

MattWellie commented 2 years ago

I've run into an issue trying to parallelize code using Peddy. I'm parsing a PED file and VCF, then splitting all variants into groups to process. This splitting makes it a strong candidate for parallelization, but I can't pickle the Ped() object, so multiprocessing is blocked.

_pickle.PicklingError: Can't pickle <class 'peddy.peddy.UNKNOWN'>: it's not the same object as peddy.peddy.UNKNOWN

Probably relating to the handling of unknown members in the Pedigree: https://github.com/brentp/peddy/blob/master/peddy/peddy.py#L102-L104

This is completely non-urgent, and I'll see if I can work out a fix which can be ported upstream

brentp commented 2 years ago

It might be because of this: https://github.com/brentp/peddy/blob/master/peddy/peddy.py#L133

Instead of working on peddy, you might try somalier it's faster and scales to more samples.