Additional ways of incorporating polydispersity into structure factors

gnsmith commented 1 year ago

Feature request It would be valuable to have additional additional ways of dealing with polydispersity in the calculation of structure factors. I work on colloidal systems, often with size size distributions (sometimes broad), and the options to deal with polydispersity and structure factors are currently limited in SasView. This has improved recently; the incorporation of the decoupling approximation is a big help. This is a definite improvement over having only the monodisperse approximation available.

There are other ways of achieving this, which provide alternatives when the size distribution gets beyond the applicability of the decoupling approximation. The two that I regularly use are the local monodisperse approximation (LMA) and Robertus's numerical solution of the P-Y approximation (https://doi.org/10.1063/1.456635). There are also solutions using partial structure factors, but I personally have not used them in my data analysis.

Desired solution Just as the decoupling approximation is now an option for calculating S(Q), I would like the LMA offered as another option. This would be a great solution.

The Robertus solution is restricted to spheres, as I understand it, and so its implementation would likely need to be as a standalone model. This would be desirable, but it is a less important solution.

Alternatives When fitting scattering data with size distributions and a structure factor, I tend to use SASfit. This must be a known alternative to the developers, as the SasView documentation links to the SASfit publication for more information about the models. It would be good to have a solution implemented within SasView, as my current alternative requires me to use a different software programme. This means that I cannot make use of other features of SasView when fitting data with polydisperse structure factors.

Corrections to documentation When preparing this feature request, I read the SasView documentation about "Fitting Models with Structure Factors" (https://www.sasview.org/docs/user/qtgui/Perspectives/Fitting/fitting_sq.html). At several points, the structure_factor_mode = 0 is described as the "local monodisperse approximation". As I understand it (and maybe I have misunderstood it!), this is not correct. This mode is the "monodisperse" approximation, where the object is assumed to only have a single radius. The local monodisperse approximation divides the size distribution into slices, uses a monodisperse radius for that slice, and then weights the summed scattering curves.

lucas-wilkins commented 1 year ago

My initial thoughts on this are:

1) It would be very simple to implement a model that deals with polydispersity in a different way, but it wouldn't use the polydispersity system.

2) It might be worth rethinking how polydispersity is handled, and have specific polydispersity models. There are already significant complications from the different ways that orientation dispersity is handled, along with the not very general system that uses the beta approximation. Roughly speaking, this would involve a general form of polydisperse parameters, with correlations between them, and across form and structure factors.

3) I think the docs are right. I would consider the local monodisperse approximation is what you say it is, but that's exactly what representing I(q) as the product of a structure factor and independent form factor does.

gnsmith commented 1 year ago

Thanks for the initial comments @lucas-wilkins. Here is some extra information.

At least in the case of the LMA, I would apply it the same as the decoupling (beta) approximation. I will defer to you as to how polydispersity gets applied, but of course, I am happy to validate things when/if implemented. In the case of the Robertus model, I think that would need to be a separate model.

I disagree about the docs. What is described in the docs is the fully "monodisperse" approximation. Whatever form factor you use is multiplied by a structure factor with a single ("monodisperse") radius. A polydisperse form factor is multiplied by structure factors with lots of different polydisperse radii. The local monodisperse approximation has a specific and technical meaning, which is not the case in the docs. As is, I think it's misleading. Can we get someone else involved to discuss this?

Documents

Here are some links and a document to help you. I'm afraid I only have access to one PDF at the moment.

Decoupling approximation: Analysis of small angle neutron scattering spectra from polydisperse interacting colloids (https://doi.org/10.1063/1.446055)

Robertus model: Sol gl0327.pdf ution of the Percus–Yevick approximation of the multicomponent adhesive spheres system applied to the small angle x‐ray scattering from microemulsions (https://doi.org/10.1063/1.456635)

Local monodisperse approximation: Determination of size distribution from small-angle scattering data for systems with effective hard-sphere interactions (https://doi.org/10.1107/S0021889893013810) (Also attached)

gl0327.pdf

lucas-wilkins commented 1 year ago

I think @pkienzle wrote the docs. I do see that in that paper the term "local monodisperse approximation" has a specific meaning. According to the documentation for SasView, the structure factor is "modulated" by an effective radius taken from the form factor, which suggests to me that for each form factor sampled ($P(q; r)$), there is a different, corresponding, structure factor ($S(q; r)$). These would then be multiplied together and summed with a weight that depends on the radius.

Something like:

$$I(q) = \int S(q; r) P(q; r) dp(r)$$

Thus, there are pairs of form and structure factors that are multiplied together, so each sample of $r$ sees an environmental structure with the same radius. $P$ would be $\Phi^2$ in the paper you linked.

This is how I would interpret the documentation anyway. It might not be what SasView is actually doing, but that's a different matter.

lucas-wilkins commented 1 year ago

Further thoughts on the docs for this... the docs also appear to be self-contradictory. I think the bits that say that there is no support for LMA are older, and written before the implementation of the beta approximation. There's definitely something wrong here.

gnsmith commented 1 year ago

Although it might be possible to work out what is being calculated from code, I decided to benchmark SasView fits against model data simulated from SASfit. From these fits, I think that the monodisperse method (structure_factor_mode=0) is not sampling a different structure factor S(Q,r) for each P(Q,r). In the context, effective radius means that you can put in any radius to best match your data. It does not have to be a "real" radius, rather it could be an effective one. It is correct in saying that $I(Q) \propto F^2(Q)S(Q) \equiv P(Q) S(Q)$.

Here are my fits to several simulations, which convince me that SasView is correctly calculating what it says it is. (That's good!) These are 100 Å spheres, either monodisperse or with a $\pm 40$ Å Gaussian size distribution, at a volume concentration of 20%.

Fully monodisperse S(Q) and P(Q) fullmonodisperse Good fit.

Monodisperse S(Q) with polydisperse P(Q) (structure_factor_mode=0 in SasView) monodispersesq Good fit.

Decoupling/beta S(Q) with polydisperse P(Q) (structure_factor_mode=1 in SasView) Good fit.

Local monodisperse approximation S(Q) with polydisperse P(Q) (not in SasView) lma Bad fit. Here I have used structure_factor_mode=0 for the SasView fit, but it does not match the calculation.

Like I said at the top, I haven't looked at the code, but I am pretty confident about the fits.

lucas-wilkins commented 1 year ago

In the context, effective radius means that you can put in any radius to best match your data. It does not have to be a "real" radius, rather it could be an effective one.

Yeah, that's roughly what I thought. Though can't you constrain the effective radius to a function of the real radius (with dispersity) and get the local monodisperse approximation... maybe you can't.

gnsmith commented 1 year ago

I don't think you can add a sufficiently complex effective radius in the GUI. I think it's just a single value. Maybe you can in scripting, I'm not sure.

You could, I suppose, achieve the local monodisperse approximation through summing models, by dividing a size distribution into a sum of discrete slices each with its own equal radius (equal for P(Q) and S(Q)) with proper weighting to account for the size distribution. I haven't tried to see if this is possible. However, it would be an onerous set of calculations to do in practice.

lucas-wilkins commented 1 year ago

Yeah, its simple to do in a way, but quite difficult to provide a way that integrates well into SasView in general.

lucas-wilkins commented 1 year ago

I have a plan though.

gnsmith commented 1 year ago

I have a plan though.

Better than the alternative! Let me know if you need any more input or if you need some testing.

yunliu01 commented 1 year ago

I did not know the details of the SasView code. So the following thoughts are just from a user point of view after using the SasView for some years.

Based on my experience, the current version of the SasView only supports two types of fitting using the structure factor: 1) everything is monodispersed (Fully monodisperse S(Q) and P(Q) as mentioned by @gnsmith ), 2) Decoupling approximation (I sometimes called it the static decoupling approximation for SANS/SAXS) originally proposed by Chen at al. Therefore, for the document, I also agree with @gnsmith that it is better to remove the word, local, from "If structure_factor_mode = 0 then the local monodisperse approximation will be used,", and "The β(Q) decoupling approximation has the effect of damping the oscillations in the normal (local monodisperse) S(Q). When β(Q)=1 the local monodisperse approximation is recovered. This mode is only available in SasView 4.3 and later."

Indeed, implementing the local monodisperse approximation (LMA) will be useful. Theoretically speaking, SasView has all the components needed for the LMA. However, implementing it may take time based on my past experience with the decoupling approximation method.

Just a note that in the original paper by Jan Skov Pederson on the LMA, it only demonstrated that the LMA works well for the particles interacting with a hard sphere. I haven't spent time to search literature. So it is not clear to me if the LMA will work equally well for other types of interactions, including the sticky hard sphere interaction. (I could have missed some literatures on this. ) Even though implementing the LMA for other type of interactions is theoretically straight forward, checking the accuracy of these models may need some extensive research work.

As for the method by Robertus's numerical solution, I quickly read through it today. (Hence, I may not fully understand this paper.) My impression is that they still use the analytical method originally proposed by Baxter for the sticky hard sphere in 1960s. In order to solve the OZ equation, Baxter's method requires to solve a set of non-linear equations (coupled quadratic equations), whose complexity depends on the number of groups of particle sizes. This particular paper by Robertus tried to address this by numerically solving the equations as there is no analytical solution. Therefore, it is likely that there is a need of a human input to decide if a solution is reasonable or not in the end. (This numerical solution part is vague in the paper. So it is part of my guess.) It may not be an easy job for a computer to decide if a solution is correct one or not. Note that this is a common problem when solving the OZ equation. In fact, this was also an issue for Yukawa potential, such Hayter-Penfold method or two Yukawa method. They used some tricks to check the numerical results of the pair-distance function if one could find all solutions. This approach may not be easily extended to other cases.

butlerpd commented 1 year ago

Regarding options for effective radius, there can be as many options with as much math complexity as you want, but that has to be done in the model code itself. All Eff Radii options in the code are then exposed to the GUI. I don' think however that the polydispersity parameters can be used because sasmodels allows arbitrary distributions which get wrapped around the model and the model can only know about itself not who is using it?

Adding other structure factors and other ways of integrating with polydispersity as have been implemented in SASfit have been discussed over years and in particular the locally monodisperse approximation. These would involve providing new methods and handling within sasmodels. There is some more discussion of this in issue #1703.

If there are resources available to implement some of these it would be a good time to restart some design discussions I would think.

lucas-wilkins commented 1 year ago

Right, I'll implement a LMA model for the time being, as the kind of changes needed to the workflow to make it more general are pretty substantial. I've been researching these changes, but it looks like a big, but worthwhile job.

gnsmith commented 1 year ago

Thanks for the discussion earlier this morning @lucas-wilkins. Just to put in writing that I am happy with this as a solution: a model of LMA onion shell spheres, as a motif to creating other sphere-like structures if desired (core shell spheres or vesicles). This could also be an archetype for introducing other ways of dealing with polydispersities in structure factor calculations, such as the Robertus paper I mentioned at the start.

One thing that I didn't mention earlier, but it would be good to include is the option to couple this with other structure factors. This issue with polydispersities will not only be an issue for hard spheres; it will also be an issue for any structure factor. This may require testing and validation by the user, as @yunliu01 says, but there should be the ability to use other structure factors to enable generality. I'm not sure if this could be chosen by the user in the GUI at the point of use, or if it would require separate models for each structure factor.

pkienzle commented 1 year ago

When we developed the SasView β approximation we used SasFit to check our calculations, so this is not an independent check on the validity of the result.

Regarding LMA, §4.1.3 of the SasFit manual gives $I(q) =∫N(x) F²(q, x) S(q, R(x)) dx$ for polydisperse parameter $x$ distributed as $N(x)$. The current calculation kernel only operates on a single model, returning $∫N(x) F(q, x) dx$ and $∫N(x) F²(q, x) dx$. The $S(q)$ is computed in a call to a separate kernel. These results can be used for the monodisperse approximation (§4.1.1) and the decoupling approximation (§4.1.2) but not the LMA (§4.1.3).

We could implement LMA by providing both $F$ and $S$ to the core kernel evaluation code and have it generate the appropriate equations for the integral on the fly. This would make an already complicated kernel generation code even messier. The additional approximations (§4.1.4–6) require pairs of parameter sets inside a double integral, which means a doubly nested polydispersity loop. This would require much more extensive surgery in the kernel wrapper, with an impossibly slow calculation if there are multiple polydisperse parameters.

An alternative approach is to perform the polydispersity loop outside the kernel wrapper, leaving the wrapper to compute only the orientation averages $⟨F⟩$, $⟨F²⟩$ and the structure factor $S$. The various coupling approximations can then be implemented as numerical integrals in python. This also gives us more flexibility when calculating integrals, for example, switching to Monte Carlo integration when there are many polydisperse parameters in the model or when we are using a pair-wise coupling approximation.

In addition to simplifying the kernel wrapper code, separating the kernel from the polydispersity will allow us to take better advantage of modern GPU hardware. The current code is limited to using one thread per q value, looping over the polydispersity integral for that q. This was fine for integrated GPUs in 2010 with only a dozen cores, but Intel Gen12 processors have more cores than q values so many of them will remain idle. By removing polydispersity from the kernel we can batch calculate parameter sets in parallel on the GPU, synchronize, then accumulate them in the resulting integral. With dedicated graphics cards the batch sizes can be larger for even bigger improvements.

We can explore the hybrid calculation within the existing sasmodels code base, invoking the kernel with no polydisperse parameters and calculating the integrals on the result. The code in kernelpy._loops already implements a polydispersity loop. It just need to be tweaked so that it accumulates structure(R)*form() instead of form(), with R = form_radius(). This won't have the performance improvements from parallel evaluation of parameter sets but it should get you a slow LMA without too much effort.

Pair-wise coupling will require completely new code. From each polydispersity weight vectors, do a weighted selection of $n$ parameters. You can then pick $n$ pairs at random (with replacement) and calculate $⟨F⟩$ for each and $S$ for joint effective radius. Accumulate $⟨F⟩⟨F'⟩S$. The joint weighting $N$ is already incorporated in the weighted selection. You can trade memory for time by precomputing $⟨F⟩$ for each parameter set. Parallel kernel evaluation will be pretty natural with this new code. Similarly for monodisperse and decoupling approximations, but with only a single $⟨F⟩$. The existing gridded calculation fits in the same framework using grid points for each batch, but including the weight product in the accumulator.

The MC pairwise coupling approximations should extend naturally to mixture models, first selecting the models according to model weights, then selecting parameters for each model according to model polydispersity. This will not work with the existing implementation of mixture models, which returns a single $F²$ for the mixture rather than return individual $F$ for the components.

Note that model reparameterization can implemented directly with the alternative approach. For example, if we want to represent a fixed volume ellipsoid using a polydisperse ratio of major-minor axes we can generate a parameter set (V, ratio) and transform it to (R major, R minor) with a simple python expression and use the transformed parameters to compute the integral. The current method requires a new kernel model for each reparameterization, with the transformation code injected in the heart of the polydispersity loop prior to calling the kernel.

Also note that this does not address performance of the orientation average in monodisperse models (paracrystals, pringle, …). That will require unrolling of the loops within the individual Iq() models.

SasView / sasview

Additional ways of incorporating polydispersity into structure factors #2423