SasView / sasview

Code for the SasView application.
BSD 3-Clause "New" or "Revised" License
51 stars 41 forks source link

Polydispersity plot label is inconsistent with what is actually plotted. #2472

Open butlerpd opened 1 year ago

butlerpd commented 1 year ago

Describe the bug As discovered by user Grünewald, the polydispersity plots provided by the SasView GUI, while labeled as "normalized probability," actually plot the normalized weights at each interval. This was verified by @pkienzle and @smk78 as well.

This is related to but not the same as the similar issue in SasView/sasmodels#556. In that case the plot should most likely remain in weights rather than probability. sasview generates its own plot from the same data.

To Reproduce simply fit any data set with polydispersity turned on. This should pop up a polydispersity plot (if not there is another bug). The area under the curve will not be 1 as expected. However, Grünewald found that, at least in his case (sesans data fit to a sphere of radius 39,993 and a log-normal polydispersity; PD=0.139) the sum of the y array is equal to 1.

Expected behavior At the least the axis label, documentation and the data should all be consistent. For the purposes of this plot, the right answer is probably to keep the axis label and make sure the plot actually plots the probability instead of the weights. The documentation should be updated as well to make this more clear.

SasView version (please complete the following information):

Operating system (please complete the following information):

Additional context Discussing this with @pkienzle, he provided the following suggestion for achieving the result suggested above:

   # This works for length 2+ in numpy even when [2:] and [:-2] are beyond the ends of the array.
   # Length 1 is not plotted so the code doesn't get here.
   dx = np.hstack((xarr[1]-xarr[0], xarr[2:] - xarr[:-2], xarr[-1] - xarr[-2]))
    y = yarr/dx

The file of interest would be: https://github.com/SasView/sasview/blob/e93db928e885149d8f7b03648a254d9116a7790f/src/sas/qtgui/Perspectives/Fitting/FittingUtilities.py#L625

lucas-wilkins commented 1 year ago

Expected behavior At the least the axis label, documentation and the data should all be consistent. For the purposes of this plot, the right answer is probably to keep the axis label and make sure the plot actually plots the probability instead of the weights. The documentation should be updated as well to make this more clear.

In my opinion, the answer would be to change the line plot, to a ball and stick plot. Then you don't have to reweight them by dx. The polydispersity values as plotted should be as close to the weights used in the calculations, as the only reason you want to see them is to visually check.

It's probably wrong to phrase the difference as one showing the probability and the other showing the weights. More it's the probability of a discrete distribution that coincides with the probability density function of a continuous distribution.

Discussing this with @pkienzle, he provided the following suggestion for achieving the result suggested above...

This is the calculation you'd need to do to weight a probability density, but it won't fix the issues with the plot. The plot can have different x axes (linear, log, etc) independently of the x data so the y axes will need to scale in line with the choice of x axis type.

If you treat it as a discrete probability distribution, and use the more appropriate ball and stick plot, then the scaling of the y axes doesn't matter.

pkienzle commented 1 year ago

The underlying distributions are continuous. The discrete distribution is simply a numerical approximation. Changing the number of points in the approximation does not change the probability of observing a point of radius $R$ in the sample, which is why we should be showing probability rather than weight.

The probability density is determined by how the weights are used in the polydispersity calculation. The transformation from weight to probability needs to apply before any transformation of the x-axis for viewing.

lucas-wilkins commented 1 year ago

Just to clarify what I was saying, which is two things really: 1) Plotting probability densities isn't just a matter of plotting some variable against another, the y variable needs to be transformed if the x variable changes. As users have the option of selecting different x variables, we would also need to transform the y variable accordingly. To my knowledge, this is not currently supported. 2) The actual numerical approximation being used is what a user wants to know about, if they want to see what a general log-normal distribution looks like they can look at the Wikipedia page.

It would be better to remove the expectation that they should represent probabilities than to bodge in a misleading representation.

On Fri, Mar 24, 2023 at 10:53 PM Paul Kienzle @.***> wrote:

The underlying distributions are continuous. The discrete distribution is simply a numerical approximation. Changing the number of points in the approximation does not change the probability of observing a point of radius $R in the sample, which is why we should be showing probability rather than weight.

The probability density is determined by how the weights are used in the polydispersity calculation. The transformation from weight to probability needs to apply before any transformation of the x-axis for viewing.

— Reply to this email directly, view it on GitHub https://github.com/SasView/sasview/issues/2472#issuecomment-1483542899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKU4SQOEQOJH4KP352XBTLW5YQW3ANCNFSM6AAAAAAWFMBQGM . You are receiving this because you commented.Message ID: @.***>

--

Dr Lucas Wilkins +44 (0) 7505 915 726

Personal Website: http://www.lucaswilkins.com/ Alternate e-mail: @.***

pkienzle commented 1 year ago

Agreed that plotting P(x) values against x² will not show a density with respect to x². Similarly plotting I(q) values against q² will not show I(q²). Setting the y label to probability of radius and intensity at Q respectively makes the graphs "correct".

The log10 scaling changes the x tick locations for the individual values so the q value shown on the x-axis matches the q value calculated for the intensity. You may be able to do something similar for x² and x⁴ scaling.

lucas-wilkins commented 1 year ago

"correct"

I agree whole heartedly with the use of scare quotes here.

The log10 scaling changes the x tick locations for the individual values

Yeah, I was thinking about this, it leads to even more ambiguity about what is really "correct". When probabilities are finite / have discrete support, there's no ambiguity.