diffpy / diffpy.srfit

framework for complex modeling and atomic structure optimization
Other
7 stars 21 forks source link

FitRecipe returns different results when it is running in parallel #56

Closed chiahaoliu closed 5 years ago

chiahaoliu commented 5 years ago

@pavoljuhas I found the FitRecipe returns different refinement results when it is running in parallel with multiprocessing. Running a series of refinement sequentially in a for loop would give different results than running the same series of refinements in parallel and results from sequential refinement seem more correct. Interestingly, I didn't have this problem till when using diffpy stack that was pinned to py3.6

I've created a minimal code based on the standard Ni refinement example in the doc. to reproduce this behavior in my fork https://github.com/chiahaoliu/diffpy.srfit/tree/test_parallel

My diffpy packages are installed via conda, with information below (include scipy as we are using their optimizer)

# Name                    Version                   Build  Channel
diffpy-cmi                3.0.0                    py37_0    diffpy
diffpy.srfit              3.0.0                    py37_0    diffpy
diffpy.srreal             1.3.0            py37hbf07610_0    diffpy
diffpy.structure          3.0.0                      py_0    diffpy
diffpy.utils              3.0.0                      py_0    diffpy
libdiffpy                 1.4.0                h19d8545_1    diffpy
scipy                     1.2.1            py37h7c811a0_0

Thanks!

dragonyanglong commented 5 years ago

Hi @pavoljuhas , I have one program also using muultiprocessing on running diffpy-cmi FitRecipe in parallel. I met the same problem as @chiahaoliu mentioned.

I compared the results using diffpy-cmi 3.0a0 from diffpy/channel/dev channel and diffpy-cmi 3.0.0 from diffpy channel, and the fit results on same structure and data are quite different.

I list the diffpy packages (I use Mac OS):

# Name                    Version                   Build  Channel
diffpy-cmi                3.0.0                    py37_0    diffpy
diffpy-srfit              3.0.0                    pypi_0    pypi
diffpy-structure          3.0.0                    pypi_0    pypi
diffpy-utils              3.0.0                    pypi_0    pypi
diffpy.srfit              3.0.0                    py37_0    diffpy
diffpy.srreal             1.3.0            py37h4867ba1_0    diffpy
diffpy.structure          3.0.0                      py_0    diffpy
diffpy.utils              3.0.0                      py_0    diffpy
libdiffpy                 1.4.0                hab57f6b_1    diffpy

Thanks.

pavoljuhas commented 5 years ago

@chiahaoliu, @dragonyanglong - I can see the problem in Python 3.7, but things work as expected in 3.6. The issue is in pickling/unpickling of fitrecipe which somehow breaks constraint relations.

For now the workaround is to refactor your code so you don't copy fitrecipe to the parallel job - for example

def fit_wrapper(ciffile, data, other_parameters):
    from scipy.optimize.minpack import leastsq
    recipe = makeRecipe(ciffile, data, other_parameters)
    leastsq(recipe.residual, recipe.getValues())
    result = FitRecipe(recipe)
    return result
chiahaoliu commented 5 years ago

@pavoljuhas Thanks for the tip 👍

dragonyanglong commented 5 years ago

Thanks @pavoljuhas

pavoljuhas commented 5 years ago

@chiahaoliu - can you try again with https://github.com/diffpy/diffpy.structure/pull/26 ?

cd /tmp
git clone -b fix-atom-structure-pickle \
    https://github.com/pavoljuhas/diffpy.structure.git
export PYTHONPATH=$PWD/diffpy.structure/src
cd /to/srfit/examples
python srfit_parallel_test.py
chiahaoliu commented 5 years ago

@pavoljuhas Thanks! I just got time for testing it know, it fixes the problem.

Would you plan to have a new conda release on this patch anytime soon?

pavoljuhas commented 5 years ago

@chiahaoliu - yes, there is a new issue for that https://github.com/diffpy/diffpy.structure/issues/27. please ping me if it is not out within 14 days.