Discussion on data structure

marcocaggioni commented 2 years ago

Opening this issue to discuss data structure:

Scope: proposing a standard way to store and retrieve data

Proposing a dictionary of pandas dataframes as a good way to handle data from our primary sources:

Rheology measurements - sequence of measurements steps
Published data digitized from figures - collection of datasets identified by a key in the legend

When the order in the dictionary is meaningful, as for example in the rheology measurement case for which the steps order is important I propose to prepend an index to the step name

data_dict.keys()

dict_keys(['1_Flow_curve_down', '2_Strain_sweep_down', '3_Strain_sweep_up', '4_Freq_sweep', '5_Flow_curve_up'])

type(data_dict['1_Flow_curve_down'])

pandas.core.frame.DataFrame

This structure is very generic and can be stored as a sqllite file very easily:

import sqlite3
def data_dict_to_sqllite(data_dict,sqllite_file, add_step_index=False): 
    cnx = sqlite3.connect(sqllite_file)

    for index, (key, dataframe) in enumerate(data_dict_974.items()):
        if add_step_index:
            key=str(index) +'_'+ key

        dataframe.to_sql(name=key, con=cnx)

data_dict_to_sqllite(data_dict,'data_dict.db', add_step_index=True)

once you have the sqllite file you can explore it with for example:

https://sqlitebrowser.org/ a desktop application for windows and mac that allows to open and explore the sqllite file
https://github.com/pbugnion/jupyterlab-sql directly in jupyterlab

The data can be read back into a dictionary of tables:

def read_sqllite_to_data_dict(sqllitefile):
    try:
        cnx = sqlite3.connect(sqllitefile)

        data_dict={}
        for item in pd.read_sql("select * from sqlite_master WHERE type='table'", cnx).iterrows():
            table_name=item[1]['name']
            data_dict[table_name]=pd.read_sql_query(f'SELECT * FROM "{table_name}";',cnx)

        return data_dict

    except sqlite3.Error as error:
        print("Failed to execute the above query", error)

    finally:

        if cnx:
            cnx.close()
            print("the sqlite connection is closed")

data_dict=read_sqllite_to_data_dict('data_dict_974.db')