FAIRmat-NFDI / nomad-material-processing

A NOMAD plugin containing base sections for material processing.
https://fairmat-nfdi.github.io/nomad-material-processing/
Apache License 2.0
9 stars 2 forks source link

create archive function #34

Open aalbino2 opened 7 months ago

aalbino2 commented 7 months ago

I prepared the create_archive function. Let's discuss this after the IKZ workshop @hampusnasstrom:

def get_reference(upload_id, entry_id):
    return f'../uploads/{upload_id}/archive/{entry_id}#data'

def get_entry_id_from_file_name(filename, upload_id):
    from nomad.utils import hash
    return hash(upload_id, filename)

def create_archive(
    entry_dict, context, filename, file_type, logger, *, bypass_check: bool = False
):
    import yaml
    import json
    from nomad.datamodel.context import ClientContext
    if isinstance(context, ClientContext):
        return None
    if context.raw_path_exists(filename):
        with context.raw_file(filename, "r") as file:
            existing_dict = yaml.safe_load(file)
    if context.raw_path_exists(filename) and existing_dict != entry_dict:
        logger.error(
            f"{filename} archive file already exists. "
            f"You are trying to overwrite it with a different content. "
            f"To do so, remove the existing archive and click reprocess again."
        )
    if not context.raw_path_exists(filename) or existing_dict == entry_dict or bypass_check:
        with context.raw_file(filename, "w") as newfile:
            if file_type == "json":
                json.dump(entry_dict, newfile)
            elif file_type == "yaml":
                yaml.dump(entry_dict, newfile)
        context.upload.process_updated_raw_file(filename, allow_modify=True)

    return get_reference(
        context.upload_id,
        get_entry_id_from_file_name(filename, context.upload_id)
    )
aalbino2 commented 7 months ago

I think also Micha can possible take this. Notice the more general entry_dict is an EntryArchive such the following:

entry_dict = EntryArchive(
                data=experiment_data,
                # m_context=archive.m_context,
                metadata=EntryMetadata(upload_id=archive.m_context.upload_id),
            )
hampusnasstrom commented 7 months ago

Why do you pass a dict and not an EntryData section to the function? Also what is the * argument for?

aalbino2 commented 7 months ago

entry_dict is actually an improper variable name, because as you see above, it is an EntryArchive type! The EntryData type is nested inside, it is in the present example the experiment_data variable

The * means that bypass_check must be specified as a keyword argument when calling the function. For example, this would be a valid call:

        create_archive(
            experiment_archive.m_to_dict(),
            archive.m_context,
            experiment_filename,
            filetype,
            logger,
            bypass_check=True,
        )
aalbino2 commented 7 months ago

by_pass check was just a @theodore idea to patch something few weeks ago. We can avoid putting it in our official plugin

aalbino2 commented 7 months ago

ask about if isinstance(context, ClientContext) failing in local tests @hampusnasstrom

aalbino2 commented 3 months ago

Most updated version (from imem-nomad-plugin.utils):

def create_archive(
    entry_dict, context, filename, file_type, logger, *, overwrite: bool = False
):
    from nomad.datamodel.context import ClientContext
    from nomad.datamodel import EntryArchive

    file_exists = context.raw_path_exists(filename)
    dicts_are_equal = None
    if isinstance(context, ClientContext):
        return None
    if file_exists:
        with context.raw_file(filename, "r") as file:
            existing_dict = yaml.safe_load(file)
            dicts_are_equal = dict_nan_equal(existing_dict, entry_dict)
    if not file_exists or overwrite or dicts_are_equal:
        with context.raw_file(filename, "w") as newfile:
            if file_type == "json":
                json.dump(entry_dict, newfile)
            elif file_type == "yaml":
                yaml.dump(entry_dict, newfile)
        context.upload.process_updated_raw_file(filename, allow_modify=True)
    elif file_exists and not overwrite and not dicts_are_equal:
        logger.error(
            f"{filename} archive file already exists. "
            f"You are trying to overwrite it with a different content. "
            f"To do so, remove the existing archive and click reprocess again."
        )
    return get_hash_ref(context.upload_id, filename)