QPlane data output as DataFrame

abehersan commented 1 year ago

As of MJOLNIR-V-1.2.4 the data output of plotQPlane are lists of numpy arrays. For consistency with MJOLNIR's other plotXXX methods, it would be nice to output data of the QPlane as a Pandas DataFrame.

The plotQPlane method outputs a 3-tuple with the data and the MPL axis, see https://github.com/MJOLNIRPackage/MJOLNIR/blob/5e0c9c7f31abcbe4f94a65868bac0aa4ac44dff6/MJOLNIR/Data/DataSet.py#L1666

In principle all data from the figure is in the outputted ax variable via ax.Int, ax.Qx, ax.Qy, ax.EMin and ax.EMax.

Since I'm not 100% sure how the data is structured initially in the various attributes of the ax object, nor does it seem that the intensity, qx and qy arrays have the same length, it's unclear to me how to file a PR to add this functionality.

Jakob-Lass commented 1 year ago

Yes, this is a left-over from the older data transfer method of MJOLNIR version 1.1. Depending on the parameters, i.e. the binning style, of the call the data returned have different orders. For both the standard QxQy binning and the radial binning the positions of the bins signify the corners of the grid making their size 1 larger than the intensities.

The ax object has a method 'to_csv' that implements a way to convert to a pandas dataframe. I am, however, not sure this is exactly the format most optimal for the method. What is you view on this?

abehersan commented 1 year ago

I examined the output of ax.to_csv and it's exactly what is needed. A very simple fix would be to add a static method to the plotQPlane function (ax.to_pandas for example) and simply extend it in ax.to_csv if a csv output is required. See for example the following snippet which could be anywhere in the plotQPlane function definition after the ax object is defined

@static_method
def to_pandas(ax):
    Qx,Qy = [x[0] for x in ax.bins]
    QxCenter = 0.25*(Qx[:-1,:-1]+Qx[:-1,1:]+Qx[1:,1:]+Qx[1:,:-1])
    QyCenter = 0.25*(Qy[:-1,:-1]+Qy[:-1,1:]+Qy[1:,1:]+Qy[1:,:-1])
    H,K,L = ax.sample.calculateQxQyToHKL(QxCenter,QyCenter)
    E = np.full(H.shape,np.mean([ax.EMin,ax.EMax]))
    intensity,monitorCount,Normalization,NormCount = [x[0] for x in ax.data]
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        Int = np.divide(intensity*NormCount,Normalization*monitorCount)
        Int[np.isnan(Int)] = -1
        Int_err = np.divide(np.sqrt(intensity)*NormCount,Normalization*monitorCount)
        Int_err[np.isnan(Int_err)] = -1
    dataToPandas = {'Qx':QxCenter.flatten(),'Qy':QyCenter.flatten(),'H':H.flatten(),'K':K.flatten(),'L':L.flatten(),'Energy':E.flatten(), 'Intensity':intensity.flatten(), 'Monitor':monitorCount.flatten(),
                    'Normalization':Normalization.flatten(),'BinCounts':NormCount.flatten(),'Int':Int.flatten(),'Int_err':Int_err.flatten()}
    ax.d = pd.DataFrame(dataToPandas)
    return ax

... and from line 1641 onwards

if not _3D:
    def to_csv(fileName,ax):
        data_dframe = to_pandas(ax)

        with open(fileName,'w') as f:
            f.write("# CSV generated from MJOLNIR {}. Shape of data is {}\n".format(MJOLNIR.__version__,Int.shape))

        data_dframe.to_csv(fileName,mode='a')
    ax.to_csv = lambda fileName: to_csv(fileName,ax)
    ax.bins = [ax.Qx,ax.Qy]
    ax.data = [ax.intensity,ax.monitorCount,ax.Normalization,ax.NormCount]
return ax.d, ax

In the end, users get the same data that was plotted with MJOLNIR's built-in functions in an easy-to-handle way. I will test this out and make a PR- thanks for the pointer to the to_csv method! I just completely overlooked it...

I think it is optimal since essentially every row of the dataframe is a single S(Q, E) point with the small added redundancy of all the points having the same energy, which is anyways always good to have IMO as a sanity check.

abehersan commented 1 year ago

See PR #78 for a possible fix.

I tested it on my data and the output is as expected - will need to update tests on the plotQPlane method before pushing to main tho

xfelw commented 1 year ago

Thanks a lot for implementing that feature. This was precisely what I was looking for recently.

I updated MJOLNIR to 1.2.6 and tried it on my data. Unfortunately, all the given HKLs in the DataFrame were wrong.

I traced the problem to the Sample.py. Since my data was collected on a hexagonal sample, it might be a projection problem. Changing lines 282-283 solved my problem.

        self.orientationMatrixINV = np.linalg.inv(np.dot(self.RotMat3D,UB))
        #self.orientationMatrixINV = np.linalg.inv(self.UB)

I'm not far enough into the subject to say if this change will cause further problems. But maybe someone else knows what exactly caused my mentioned problem.

abehersan commented 1 year ago

@xfelw interesting... the example data I tested this fix on is also hexagonal. The RLU indexing was correct in my case without the need to do the fix you suggested.

I reopened the issue since if the data exported by the fix is incorrectly indexed it is not useful. I.e. if it is not as shown in the plot of the QPlane for example.

Jakob-Lass commented 1 year ago

@xfelw I have been looking through the places where self.orientationMatrixINV is utilized and it should be possible to change it, but I am not quite sure why this happens. Do you have a code snippet of the issue? And which data files are you using?

Jakob-Lass commented 1 year ago

I believe I found the mistake in the code and it is indeed related to the RotMat3D. When a QPlane is created in MJOLNIR, the QxQy points are rotated such that the main axis lay along the x-axis. While this is true for the plotted data, the data behind the scene is still rotated to align Qx with the outgoing/direct beam, so the rotation has to be applied when calculating HKL for the data frame.

I have implemented this in 03f0692e143379c7f634ddb634f29e850655d113. I will be making a larger update within the coming weeks for other methods I have implemented so let me know if you need a quick new version to verify this solution

xfelw commented 1 year ago

@Jakob-Lass I tried the fix manually, but it didn't work so far. The data I use, we collected together on the Cs3Fe2Br9 last December, here I especially used the files: camea2022n003602-3607,3609-3638.hdf

Here is the code, that produced the problem.

# Import IPython and packages
try:
    import IPython
    shell = IPython.get_ipython()
    shell.enable_matplotlib(gui='qt')
except ImportError:
    pass

import numpy as np
from MJOLNIR._tools import fileListGenerator
from MJOLNIR.Data import DataSet
import matplotlib.pyplot as plt

dataFolder = r'path of the data folder'
lowTDataFiles = fileListGenerator('3602-3607,3609-3638',folder=dataFolder,year=2022)

ds = DataSet.DataSet(lowTDataFiles)
ds.convertDataFile(binning=8,deleteOnConvert=True)

EMin = -0.15
EMax = 0.15

energy_transfere = round(EMax-(EMax-EMin)/2, 3)

Data ,ax = ds.plotQPlane(EMin=EMin, EMax=EMax,xBinTolerance=0.01,
                             yBinTolerance=0.01,binning='xy',vmin=0.0,
                             vmax=5, zorder=1, cmap='viridis')

Data.to_csv(r'path to save the DataFrame')

fig = ax.get_figure() # Extract figure from returned axis
fig.colorbar(ax.pmeshs[0], anchor = (0.25,0.5))
fig.set_size_inches(8,8)

fig.tight_layout()

Maybe you can reproduce the problem.

Jakob-Lass commented 1 year ago

Thanks for the update!

The data returned in the Pandas DataFrame is indeed wrong. I have now corrected it (a0b94551467febbf57ed8402f0e1e32fb314e790) and made a test version. If I could get you to test it out by updating MJOLNIR using the following command

pip install -i https://test.pypi.org/simple/ MJOLNIR

and report back if it solved the issue, I would be very happy. The new version is 1.2.6post1.

xfelw commented 1 year ago

I installed the test version after removing the 1.2.6, and indeed it worked without any further manipulation of the installed package.

Thanks a lot!

MJOLNIRPackage / MJOLNIR

QPlane data output as DataFrame #77