Open tmusho opened 3 months ago
Sure, I will check 🙂 I am just leaving a comment for you, let me know if something is not clear.
At first sight, I would break each column into its own Quantity
. In the end, Quantities are just the field of what you would have in a JSON (or the dataset if you would use HDF5, as another example). Furthermore, I can imagine you can classify by defining a SubSection to group all columns data as Quantities; something like:
class UVVisData(ArchiveSection):
"""
A base section used to define the UV-Vis data quantities.
"""
excited_state_number = Quantity(
type = int,
description="""
The excited state number as an integer...
"""
frequency = Quantity(
type = np.float64,
unit='THz', # check if this units are correct :-)
description="""
The excited state number as an integer...
"""
# Other columns as quantities defined here.
class ModelData(Entity):
"""
A base section used to specify the system solver information used for simulations.
"""
...
uv_vis_data = SubSection(sub_section=UVVisData.m_def, repeats=False)
Note the repeats
for the SubSection, in case this is a list of UVVisData, repeats should be changed to True.
@JosePizarro3 can you help me figure out how to define a custom type in the scheme. The data looks like this: 0 0.000000 0.000000 0.000000 1 2 3.069030 403.971311 0.000000 3
Here is my attempt. I reverted to just reading a string in. https://github.com/FAIRmat-NFDI/nomad-ML-generate-smiles-reader/blob/ad6205a127a918a44983a40feea85706e24635a5/src/nomad_ML_smiles_reader/schema.py#L224