Open JosePizarro3 opened 4 months ago
Thanks for opening this as an issue. I really like the idea. I see some of it is based on one of our meetings early this month.
It will be much cleaner to separate the quantities containing the raw data and the parameters for analysis. For instance, let's consider these classes for the XRD case:
XRDResults
: contains the quantities like two_theta
and intensity
XRDAnalysis
: extends SpectralAnalysis
class which itself is extending Analysis
basesectionXRDAnalysisResult
: similar inheritance as XRDAnalysis
The class flow will be something like:
Analysis
> SpectralAnalysis
> XRDAnalysis
AnalysisResult
> SpectralAnalysisResult
> XRDAnalysisResult
Through SpectralAnalysis
, we will have certain analysis (like peak finding) as a part of the normalize method. The parameters for these analysis will be defined as quantities inside SpectralAnalysis
. Being a child, XRDAnalysis
will inherit all this and ideally also support additional functionality specific to XRD.
XRDAnalysis
will have a quantity inputs
(coming from Analysis basesection), which can be used to attach a SectionReference to XRDResults
. This way we connect the raw data to Analysis rather than composing it inside.
After the normalize method of XRDAnalysis
conducts the analysis, it will populate the XRDAnalysisResult
section and attach it as a SectionReference to the outputs
quantity of the XRDAnalysis
(inherited from Analysis basesection).
Also, for now I wouldn't think much of the ELN classes or JupyterNotebook classes. These are the ones that will eventually exploit the analysis basesections, like the ones described above, and provide an alternative way of conducting analysis using jupyter notebooks. I have some ideas in this aspect which can be developed in parallel to this.
analysis_source.py
can be a common ground to define analysis functions that can be used by both the analysis basesections as well as the jupyter notebooks.
Indeed, thanks for pointing it out everything 😄
I didn't work the details (also because the baseclasses.py definitions are a bit empty, only with the skeleton which is nice), but yes, any Analysis
should contain what you explained:
inputs
as ref to the EntryArchive
being used. This is to be resolved by normalization and users should have the chance to edit it, but if not, it should simply be resolved automatically.method
section containing the parameters used during our analysis. This should be able to get all the params automatically when some functions are being called (e.g., find_peak(param1, param2)
should populate method(param1=param1, param2=param2)
).outputs
(AnalysisResults
), I am not sure if I follow your idea of referencing. This should be directly the archive Analysis
itself.So in sort, an Analysis
entry will contain a ref to the archive analyzed, the methodological parameters relevant for the analysis, the results stored directly in the entry AnalysisResults section.
Some questions and extra-points from my side:
Results
and outputs
should be merged into one. Being working in the Simulation
schema, I would suggest outputs
and AnalysisOutputs
. I agree these are synonyms, but I will prefer to keep everything consistent.ArchiveQuery
, i.e., a one liner method of a class name UploadData
.We can talk more after the DPG, right now I am not very free but my idea is to keep working after it (starting week of 25.03).
Can you explain more about how the input reference can be resolved automatically without the user? The Analysis.inputs
contains a repeatable sub-section of references. And these references are to be indicated via ReferenceEditQuantity. One way of automating this is to reference all the entries belonging to a certain class (for example, in ELNXRayDiffractionAnalysis
, entries belonging to the XRayDiffraction measurement class are added as input). This will not require the user's input. Is that what you also meant?
I agree that we need a method
section or perhaps a sub-section called parameters
. Will add this as a separate issue.
You are right, outputs will not be a reference. It will be an AnalysisResults
sub-section. I misread the code.
By the way, as you may have already noticed, I started a discussion to include all the ideas in one place. This is to avoid discussing about multiple things in one issue ;)
I want to add some basic classes inheriting from
Analysis
. These will play the role of storing refs to the actual data to which the analysis is performed + quantities related with the analysis itself (e.g., theoptions
in thefind_peaks
function ofscipy.signal
... for finding peaks).My initial idea (still to be discussed) would be to have something like:
This is just an initial idea. I will test and see how it compares with the
category
and the current implementation. From there, I can start working on other analysis functions.