SasView / sasview

Code for the SasView application.
BSD 3-Clause "New" or "Revised" License
48 stars 41 forks source link

Enable batch processing for P(r) #1448

Open wpotrzebowski opened 4 years ago

wpotrzebowski commented 4 years ago

A user from Lund University has a case where processing P(r) in the batch mode would be particularly useful (time-resolved data). We can currently process only a single file for P(r) analysis however it would be useful to enable batch mode for it too.

smk78 commented 4 years ago

I would agree this would be a useful capability. But implementing it would need some careful thought. It might be straightforward if the user is wanting to process subtle variations of the same system where things like Dmax and the number of terms aren't changing. But what if they want to process X different proteins?

butlerpd commented 4 years ago

I will say that the original "spec" yeeeaaars ago was that everything I can do once I should be able to do in batch ... including Pr... but, as you point out, it is a bit difficult. Even for model fitting. I would agree that the immediate way forward would be to organize it for the most obvious use case if somebody wanted to tackle this.

However I have been thinking that there should be some more automation possible. For example if we fit the low q to an Rg we should be able to estimate Dmax. Also we currently have the exploration tool that will show a plot of Xi^2 (or other things) vs Dmax which we could try to automate maybe with a summer student? trying to figure out what algorithm works? (I won't use the word AI :-) but I would say that is a much larger scope with a longer "time horizon?"

butlerpd commented 3 years ago

Interestingly I note that the P(r) documentation specifically says that you CAN do batch fitting .. Ooops 🤦 We might want to change that ... until (if and when) it is in fact possible?

smk78 commented 3 years ago

In which case we might want to address the behaviour of the 'Batch mode' checkbox too. Because you can 'enable' batch mode and then select Inversion!

You can do the same with Invariant and Corfunc too.

smk78 commented 3 years ago

@krzywon : whilst checking the above I've noticed (yet?) another issue with P(r). I don't know if it's already on your radar? But the data file dropdown in the P(r) perspective will happily accept as many datasets as you want to send it. If I select the first one and click calculate I get what I'll call 'normal behaviour'. If I then select a different dataset from the dropdown, some output values flicker but the dropdown then resets to the first dataset. So you cannot actually Calculate P(r) for anything but the first dataset.

The Log Explorer reports: 21:35:24 - INFO: Invertor.estimate_alpha: Invertor: could not invert I(Q) Pinvertor.get_matrix: Some I(Q) points have no error.

And this was running a local build of main.

butlerpd commented 3 years ago

That is very interesting. I had not noticed the dropdown before (probably because I know there should not be one so never bothered looking :-) However, using my local build which is a very recent branch from main, I see seem to have some different behavior from what is reported by @smk78.

That said, I confirm that sending several data sets in a row to the Inversion (or other single data set perspectives) looks like it does a "swap" but actually is populating a drop down. However it seems clear the actual data is no longer available to the perspective.

Finally the Log Error reported is actually unrelated. I am guessing @smk78 was using the emulsion data (core contrast, shell contrast ect). I was frustrated by this myself and was going to report it as a bug but am still thinking about how SasView should handle this. The problem is the data set itself: the very last data point has an uncertainty of 0 (so a divide by zero somewhere?) I was originally thinking that P(r) doesn't deal with 2 column data but that is not the case. It is fine with 2 column or 3 column ... just not 3 column which then has one (or more) line with a zero for the uncertainty. One can always try writing massive amounts of code to fix every possible stupidity but then it risks eventually making the code unwieldy and unsuportable. That said, it happens enough that it will cause confusion and pain to users. Maybe some data integrity checks in the reader on loading data would be appropriate? But that path too is fraught with pitfalls....

smk78 commented 3 years ago

Doing a little digging I find that the paragraph on batch inversion was added to the docs at version 5.0.0 (but the person responsible did not admit it!). There is also no mention of batch enabling P(r) in any of the release notes.

If I was to modify pr_help to:

Would there be any objections?

krzywon commented 3 years ago

@smk78 - I spent most of the 2017 code camp in Copenhagen writing the P(r) perspective for v5.0 and likely wrote that paragraph. My original intent was to have batch capabilities in the perspective for release v5.0, and much of the infrastructure is built in to allow it, but wasn't stable enough for release so the batch capabilities were turned off.

That said, I have no objection to changing the documentation.

smk78 commented 3 years ago

@butlerpd commented:

Finally the Log Error reported is actually unrelated. I am guessing @smk78 was using the emulsion data (core contrast, shell contrast ect). I was frustrated by this myself and was going to report it as a bug but am still thinking about how SasView should handle this. The problem is the data set itself: the very last data point has an uncertainty of 0 (so a divide by zero somewhere?) I was originally thinking that P(r) doesn't deal with 2 column data but that is not the case. It is fine with 2 column or 3 column ... just not 3 column which then has one (or more) line with a zero for the uncertainty. One can always try writing massive amounts of code to fix every possible stupidity but then it risks eventually making the code unwieldy and unsuportable. That said, it happens enough that it will cause confusion and pain to users. Maybe some data integrity checks in the reader on loading data would be appropriate? But that path too is fraught with pitfalls....

Nice catch, @butlerpd . Indeed, it is the drop contrast dataset. In it I_dev = 0 because in fact I = 0.

smk78 commented 3 years ago

@smk78 - I spent most of the 2017 code camp in Copenhagen writing the P(r) perspective for v5.0 and likely wrote that paragraph. My original intent was to have batch capabilities in the perspective for release v5.0, and much of the infrastructure is built in to allow it, but wasn't stable enough for release so the batch capabilities were turned off.

That said, I have no objection to changing the documentation.

Change made!

butlerpd commented 2 years ago

So @krzywon , was the intention that the user set a fixed Dmax for everything in the batch mode and that the batch mode pull the suggested values for each of the Reg Constant and Number of terms? If so, how do you ensure that the behind the scenes computations are done before running the next calculation? These do take non zero time (though usually are quick). Also you need, as I recall, to iterate till the suggestion no longer changes.

I don't think it is helpful to have a batch P(R) that assumes all terms are the same as they seem to be quite sensitive to changes in the curves?

@ru4en have you found where the "mostly done" infrastructure is hidden?

ru4en commented 2 years ago

@butlerpd I am currently working on the code that loads data onto the GUI for Batch mode. I have also added Tab functionality for the Pr interface, I might have used the existing functions in inversionPerspective.py to iteratively load and calculate the data for each of the tabs (Not sure if this is the mostly done infrastructure). At the moment all loaded files are calculated all at once which causes the program to sometimes stop responding (Will be working on a fix for this). Tabs for Pr show the calculations on individual Tabs (similar to Fitting). Although I’m yet to use that code to have a functional batch mode (Currently multiple files can be imported and calculated but only displays one calculation at a time. My current plan for Pr Batch mode is: 1) not have everything calculated straight away. But let the user set parrimeters first then let the data be processed. 2) have a calculate button to calculate all files. 3) create a new pop-up table interface similar to the Batch Fitting Results used in Fitting. 4) when the calculate button is clicked all the outputs are sent to table (in step3) 5) [ Waiting for #2065 to get merged to main ] adding export and report capabilities for the Batch data

If you'd want to see what I've done so far, please let me know and I'll add it to the branch I'm working on. origin/1448-Enable-batch-processing-for-Pr