Extend compare_metadata

WenzDaniel commented 11 months ago

We should add a fall back method to https://github.com/AxFoundation/strax/blob/5f9051bcb4fb4a8d46e5fd3da21e522959735352/strax/context.py#L1682 which only compares the lineages in case the current data-type is not stored. Currently, we are loading the metadata via:

# new metadata for the given runid + target; fetch from context
 new_metadata = self.get_metadata(run_id, target)

We should as fall back in case data is not stored:

new_lineage = st.key_for(run_id, data-type).linegae

One needs to only cast all tuples into lists or vise-versa or deactivate the type checking as otherwise one cannot compare to metadata files loaded from disk.__

KaraMelih commented 11 months ago

I started looking into this. It looks straightforward, and I think I just now understood what you meant by typecasting. Just to confirm;

a = st.get_metadata(run_id, target)['lineage']
b = st.key_for(run_id, target).lineage
strax.utils.compare_dict(a, b)

returns the same information but one of them returns them in a list while the other returns them in a tuple.

Then I need to check how deep the lists and tuples are nested to convert them.

KaraMelih commented 11 months ago

Should I make a PR branching from this commit 5f9051b ? The version on the master is different

KaraMelih commented 11 months ago

I have refactored the code to check lineages in case the data is not stored. I also wrote the following utility function for converting tuples at varying-depth nested objects.

import copy
def convert_tuple_to_list(init_func_input):
    func_input = copy.deepcopy(init_func_input)
    # if it is a tuple convert it and reiterate
    if isinstance(func_input, tuple):
        _func_input = list(func_input)
        return convert_tuple_to_list(_func_input)

    # if it is a list, go over all the elements until all tuples are lists
    elif isinstance(func_input, list):
        new_func_inp = []
        # check each element
        for i in func_input:
            new_func_inp.append(convert_tuple_to_list(i))
            # iterates until everything is all depths are exhausted
        return new_func_inp

    # if it is a dict iterate
    elif isinstance(func_input, dict):
        for k,v in func_input.items():
            func_input[k] = convert_tuple_to_list(v)
        return func_input
    else:
        # if not a container, return. i.e. int, float, bytes, str etc.
        return func_input

I tested it and seems to be working as intended.

However, I think the lineage comparison might be a little misleading as the lineages of different runs do not differ if they are from the same context, any runid as long as the data type is the same will result in the same lineage e.g.;

st.key_for("000000", "event_basics").lineage == st.key_for("053675", "event_basics").lineage

is always True. So by comparing the lineages (after straightening out the tuples and lists, if either of the compared parts is different) we can only say that they are created with the same context and they are looking at the same datatype. Is this the intended behavior?

WenzDaniel commented 11 months ago

So by comparing the lineages (after straightening out the tuples and lists, if either of the compared parts is different) we can only say that they are created with the same context and they are looking at the same datatype. Is this the intended behavior?

No then you did not understand my use case. In the current implementation we can only compare the lineage of the current context with the metadata of some stored data, if the metadata of the current context is also already stored somewhere. Because you load the metadata of the context via self.get_meta which only works if the data is stored somewhere. However, I think the nominal use case is rather: "I cannot load any data with my current context, why does my lineage differ with this piece of data stored in my directory."

You can just branch of the master for your changes.

AxFoundation / strax

Extend compare_metadata #770