Adding classes for analysis

JosePizarro3 commented 4 months ago

I want to add some basic classes inheriting from Analysis. These will play the role of storing refs to the actual data to which the analysis is performed + quantities related with the analysis itself (e.g., the options in the find_peaks function of scipy.signal... for finding peaks).

My initial idea (still to be discussed) would be to have something like:

class SpectralProfileAnalysis(Analysis, EntryData):

    x_axis = Quantity()  #  needed at this level of abstraction?

    # required
    intensities = Quantity()

    parameter1 = Quantity()

    def find_peak():
        ...

class XRDAnalysis(SpectralProfileAnalysis):

    two_theta = Quantity()

    # more functions specific for XRD

class SpectroscopicAnalysis(SpectralProfileAnalysis):

    energies = Quantity()

    # functions common to XPS, XAS, PES...

class XASAnalysis(SpectroscopicAnalysis):
     ...

class XPSAnalysis(SpectroscopicAnalysis):
    ...

    def assign_element():

class ELNJupyterNotebook(...):
    ...

class ELNXRDAnalysis(ELNJupyterNotebook, XRDAnalysis):

# and so on...

This is just an initial idea. I will test and see how it compares with the category and the current implementation. From there, I can start working on other analysis functions.

ka-sarthak commented 4 months ago

Thanks for opening this as an issue. I really like the idea. I see some of it is based on one of our meetings early this month.

It will be much cleaner to separate the quantities containing the raw data and the parameters for analysis. For instance, let's consider these classes for the XRD case:

XRDResults: contains the quantities like two_theta and intensity
XRDAnalysis: extends SpectralAnalysis class which itself is extending Analysis basesection
XRDAnalysisResult: similar inheritance as XRDAnalysis The class flow will be something like: Analysis > SpectralAnalysis > XRDAnalysis AnalysisResult > SpectralAnalysisResult > XRDAnalysisResult

Through SpectralAnalysis, we will have certain analysis (like peak finding) as a part of the normalize method. The parameters for these analysis will be defined as quantities inside SpectralAnalysis. Being a child, XRDAnalysis will inherit all this and ideally also support additional functionality specific to XRD.

XRDAnalysis will have a quantity inputs (coming from Analysis basesection), which can be used to attach a SectionReference to XRDResults. This way we connect the raw data to Analysis rather than composing it inside.

After the normalize method of XRDAnalysis conducts the analysis, it will populate the XRDAnalysisResult section and attach it as a SectionReference to the outputs quantity of the XRDAnalysis (inherited from Analysis basesection).

ka-sarthak commented 4 months ago

Also, for now I wouldn't think much of the ELN classes or JupyterNotebook classes. These are the ones that will eventually exploit the analysis basesections, like the ones described above, and provide an alternative way of conducting analysis using jupyter notebooks. I have some ideas in this aspect which can be developed in parallel to this.

analysis_source.py can be a common ground to define analysis functions that can be used by both the analysis basesections as well as the jupyter notebooks.

JosePizarro3 commented 4 months ago

Indeed, thanks for pointing it out everything 😄

I didn't work the details (also because the baseclasses.py definitions are a bit empty, only with the skeleton which is nice), but yes, any Analysis should contain what you explained:

The inputs as ref to the EntryArchive being used. This is to be resolved by normalization and users should have the chance to edit it, but if not, it should simply be resolved automatically.
We need to add a method section containing the parameters used during our analysis. This should be able to get all the params automatically when some functions are being called (e.g., find_peak(param1, param2) should populate method(param1=param1, param2=param2)).
For the outputs (AnalysisResults), I am not sure if I follow your idea of referencing. This should be directly the archive Analysis itself.

So in sort, an Analysis entry will contain a ref to the archive analyzed, the methodological parameters relevant for the analysis, the results stored directly in the entry AnalysisResults section.

Some questions and extra-points from my side:

Can we please use consistent naming, i.e., Results and outputs should be merged into one. Being working in the Simulation schema, I would suggest outputs and AnalysisOutputs. I agree these are synonyms, but I will prefer to keep everything consistent.
I am playing with functionalities to upload the generated analysis on the fly. The implementation is similar in spirit to ArchiveQuery, i.e., a one liner method of a class name UploadData.
I am also working on generating these archive.json and archive.yaml files in these, following docu done by @aalbino2 a while ago (https://nomad-lab.eu/prod/v1/staging/docs/howto/programmatic/publish_python.html).

We can talk more after the DPG, right now I am not very free but my idea is to keep working after it (starting week of 25.03).

ka-sarthak commented 3 months ago

Can you explain more about how the input reference can be resolved automatically without the user? The Analysis.inputs contains a repeatable sub-section of references. And these references are to be indicated via ReferenceEditQuantity. One way of automating this is to reference all the entries belonging to a certain class (for example, in ELNXRayDiffractionAnalysis, entries belonging to the XRayDiffraction measurement class are added as input). This will not require the user's input. Is that what you also meant?

I agree that we need a method section or perhaps a sub-section called parameters. Will add this as a separate issue.

You are right, outputs will not be a reference. It will be an AnalysisResults sub-section. I misread the code.

By the way, as you may have already noticed, I started a discussion to include all the ideas in one place. This is to avoid discussing about multiple things in one issue ;)

FAIRmat-NFDI / nomad-analysis

Adding classes for analysis #11