Insane memory consumption with single crystal powder model simulation

artsiommiksiuk commented 3 months ago

I'm doing a lot of simulations, and have heavy issues with simulation time and memory. I can solve the time issue (put everything into joblib and run simulations in parallel), but this lib uses insane amount of memory even for single crystal simulations.

import numpy as np
import xrayutilities as xu
# Required to make the simulation faster (it's stops overusing multiprocessing and long waits for processes joins) 
xu.config.NTHREADS = 1

# from https://www.crystallography.net/cod/1000021.html
cryst = xu.materials.Crystal.fromCIF("1000021.cif")

powder = xu.simpack.Powder(cryst, 1, crystallite_size_gauss=1e-7)
pm = xu.simpack.PowderModel(
    powder, 
    wl=0.1647,
    tt_cutoff=13.5,
    fpsettings={ 
        "axial": { 
            "AxDiv": None,
            "slit_length_source": 0.000001, 
            "slit_length_target": 0.0000001, 
            "length_sample": 0.0000001,
            "angD_deg": 0.000001
        }, 
        "global": {
            "equatorial_divergence_deg": 0,
            "diffractometer_radius": 1500
        },
        "emission": {
            "emiss_gauss_widths": (1.06886840e+00*3e-14),
            "emiss_lor_widths": (7.13941570e-01*0.5e-14)
        },
    }
)

try:
    x = np.linspace(0, 13.5, 10000)
    y = pm.simulate(x, mode="local")

    print(x, y)
finally:
    pm.close()
    pm = None
    powder = None
    cryst = None

Generally there is a correlation on how large cif file is (but not always), and lattice size. I didn't find direct correlation here as well, memory issues might be for bigger or smaller crystals.

So, my main problem is with unpredictable amount of resources per crystal simulation, some using only few hundred megabytes, another one using 32 Gb of RAM + 32 Gb of swap, which are quite insane amounts at this points.

I had a chance to try very exotic solution to check if I can solve it with very big swap, and attach 1Tb SSD and assigned it as swap. In this end some of the simulations were using 80+ GB of swap + 32Gb of ram, but they couldn't complete in 10 minutes task.

I have a feeling that some configurations produces infinite or close very large numbers solutions which all trying to convolve, but idk.

Another side issue with it, is that running this single simulation in Jupyter notebook (in vscode) leaks memory heavily. Running this single simulation leaves ~28 Gb of unreleased memory. I was suspecting that it might be that notebook itself holds some references, so I've tried to none all of the variables connected to xu, but it didn't help. Only reloading of the kernel frees the mem.

Tested on:

OS: Mac OS 14.5 - M3, Ubuntu 24.04 - i9 13900K python: 3.12.2 xrayutilities: 1.7.7 - 1.7.8

dkriegner commented 3 months ago

The problem is you are not telling the powdermodel initially that you need your simulation only up to 13.5 degree. It by default prepares for calculations up to 180 deg and together with your large unit cell this causes a huge number of peaks. I assume that this is the origin of the problem you are facing.

can you try to add a reasonable value (maybe a bit bigger than your 13.5) for tt_cutoff (optional argument to PowderModel). Since you are the second person having troubles with this in a short time I think I need to document this better.

artsiommiksiuk commented 3 months ago

@dkriegner, but it is set in the code in PowderModel + x values are in range from 0 to tt_cutoff.

Or It should be provided somewhere else?

pm = xu.simpack.PowderModel(
    ...
    wl=0.1647,
    tt_cutoff=13.5,
    fpsettings={ 
    ...

artsiommiksiuk commented 3 months ago

And I verified that tt_cutoff is set in an underlying PowderDiffraction as well to the correct value.

dkriegner commented 3 months ago

sorry. i missed that.

Can you try to reduce the number of custom settings for the fundamental parameter model (of course keep your wavelength) and check if this has any impact? Did you check if giving zero divergence is handled well by the underlying convolvers?

artsiommiksiuk commented 3 months ago

With all fpsettings section removed nothing changed. (Maybe 1-2Gb less used), but still 32 + about 15 Gb used.

artsiommiksiuk commented 3 months ago

Zero axial divergence also doesn't change anything.

dkriegner commented 3 months ago

ok, i look into it. the PowderDiffraction code allocates a lot of buffers and does a lot of caching which in your case seems to be too much. the buffering is implemented to speed up subsequent calculations (e.g. during a fitting procedure), but in a scenario where different structures which have nothing in common are calculated this is likely not helpful. I believe it here is also in particular extreme due to the large number of peaks and high point density.

artsiommiksiuk commented 3 months ago

Any workaround I can make? Is the fix / config for this would be hard to add?

dkriegner commented 3 months ago

if the buffers generation is indeed the problem its rooted very deep in the code. The original author of this code section should likely be consulted. It certainly all scales with the number of points and angular range you request in the output. Internally in the calculation an even higher point density is used. so one question is of course if you really need the 10000 points in the output. Other then reducing these values I am not sure if there is some easy fix.

artsiommiksiuk commented 3 months ago

Well, 10000 is required because of the very sharp peaks we are getting. Having less will just loose all the peaks shape information, which is not a lot anyway even with 10000 points (only 100 at best falls into single peak).

Okay, good to know at least that there isn't anything I'm missing. It would be still very nice to have this resolved somehow or have a workaround, tell me if something comes up in your head.

dkriegner commented 3 months ago

I was thinking a bit about this problem and I acknowledge that how I have included the FP_profile class by @mendenmh is oriented towards performance for many recalculations of the powder pattern without at all thinking about the memory use. For materials with smaller unit cell and commonly used CuKalpha wavelength this is also not at all an issue. If one, however, has much shorter wavelength and larger unit cell one runs into the problems you are observing.

I can imagine that one should provide a "low memory" variant of PowderDiffraction. which would not intialize a FP_profile class for each powder line already during initialization but generate them dynamically during the calculation. (One for each available thread). This must mean somehow slower calculation but should bring down the memory use dramatically.

I am currently not able to look into this for time reasons. It is likely not difficult but one needs to make sure to keep all the logic of finding the right parameters for all parts to work together. Are you willing and able to work on the code changes for this? I can provide some guidance on what to look at.

mendenmh commented 3 months ago

My FP_profile class code is designed as a reference implementation, and for formal correctness. It has zero optimization for memory usage! The cases I have used it involved a small number of reflections. Assume, in advance, that it is a terrible memory hog. :-)

Sorry.

Marcus Mendenhall

Materials Measurement Science Division National Institute of Science and Technology 100 Bureau Dr. stop 8370 (217/B115) Gaithersburg, MD 20899 USA Phone: +1-301-975-8631

On Aug 12, 2024, at 9:53 AM, Dominik Kriegner @.***> wrote:

I was thinking a bit about this problem and I acknowledge that how I have included the FP_profile class by @mendenmh is oriented towards performance for many recalculations of the powder pattern without at all thinking about the memory use. For materials with smaller unit cell and commonly used CuKalpha wavelength this is also not at all an issue. If one, however, has much shorter wavelength and larger unit cell one runs into the problems you are observing. I can imagine that one should provide a "low memory" variant of PowderDiffraction. which would not intialize a FP_profile class for each powder line already during initialization but generate them dynamically during the calculation. (One for each available thread). This must mean somehow slower calculation but should bring down the memory use dramatically. I am currently not able to look into this for time reasons. It is likely not difficult but one needs to make sure to keep all the logic of finding the right parameters for all parts to work together. Are you willing and able to work on the code changes for this? I can provide some guidance on what to look at. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

artsiommiksiuk commented 3 months ago

@dkriegner no promises for now. I understood most of the code, and I'll be able to that, but idk if I'll have a priority with this in our project. I think in the course of the next 2 months this will be more clear.

As a workaround for my issue I handpicked samples with very low CPU time required and I was able to use and have an output for them.

Thanks for the hints and feedback! I'll get in touch if I'm going to solve this.

mendenmh commented 3 months ago

Incidentally, if someone wants to work on a more production-oriented version of my class, I would be glad to help out on the side.

Marcus Mendenhall

On Aug 13, 2024, at 4:27 AM, artsiommiksiuk @.***> wrote:

@dkriegner no promises for now. I understood most of the code, and I'll be able to that, but idk if I'll have a priority with this in our project. I think in the course of the next 2 months this will be more clear. As a workaround for my issue I handpicked samples with very low CPU time required and I was able to use and have an output for them. Thanks for the hints and feedback! I'll get in touch if I'm going to solve this. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

dkriegner commented 3 months ago

thanks to both of you. I definitely keep this issue open. If I get to make changes in this part of the code (unlikely at the moment) I will think of it.

dkriegner / xrayutilities

Insane memory consumption with single crystal powder model simulation #193