ManonMarchand / rheodata

Repository to gather published rheological data
2 stars 1 forks source link

Discussion on data structure #1

Open marcocaggioni opened 2 years ago

marcocaggioni commented 2 years ago

Opening this issue to discuss data structure:

Scope: proposing a standard way to store and retrieve data

Proposing a dictionary of pandas dataframes as a good way to handle data from our primary sources:

When the order in the dictionary is meaningful, as for example in the rheology measurement case for which the steps order is important I propose to prepend an index to the step name

data_dict.keys()

dict_keys(['1_Flow_curve_down', '2_Strain_sweep_down', '3_Strain_sweep_up', '4_Freq_sweep', '5_Flow_curve_up'])

type(data_dict['1_Flow_curve_down'])

pandas.core.frame.DataFrame

This structure is very generic and can be stored as a sqllite file very easily:

import sqlite3
def data_dict_to_sqllite(data_dict,sqllite_file, add_step_index=False): 
    cnx = sqlite3.connect(sqllite_file)

    for index, (key, dataframe) in enumerate(data_dict_974.items()):
        if add_step_index:
            key=str(index) +'_'+ key

        dataframe.to_sql(name=key, con=cnx)

data_dict_to_sqllite(data_dict,'data_dict.db', add_step_index=True)

once you have the sqllite file you can explore it with for example:

The data can be read back into a dictionary of tables:

def read_sqllite_to_data_dict(sqllitefile):
    try:
        cnx = sqlite3.connect(sqllitefile)

        data_dict={}
        for item in pd.read_sql("select * from sqlite_master WHERE type='table'", cnx).iterrows():
            table_name=item[1]['name']
            data_dict[table_name]=pd.read_sql_query(f'SELECT * FROM "{table_name}";',cnx)

        return data_dict

    except sqlite3.Error as error:
        print("Failed to execute the above query", error)

    finally:

        if cnx:
            cnx.close()
            print("the sqlite connection is closed")

data_dict=read_sqllite_to_data_dict('data_dict_974.db')
ManonMarchand commented 2 years ago

We could also have a look at this repo https://github.com/JuliaRheology/RHEOS.jl

It'd be nice if we make the flow from our database to their fitting library easy.

ManonMarchand commented 2 years ago

other cool software to check : https://reptate.readthedocs.io/

ManonMarchand commented 2 years ago

example of an organization of data shared for everyone on GitHub https://github.com/fivethirtyeight/data they stick to the CSV side of the force

ManonMarchand commented 2 years ago

Nomenclature of the Society of Rheology

https://sor.scitation.org/doi/pdf/10.1122/1.4811184

ManonMarchand commented 2 years ago

Unique identifiers for samples to investigate https://github.com/IGSN

ManonMarchand commented 1 year ago

Mix SQL and JSON files https://gitlab.obspm.fr/exoplanet/py-linq-sql/-/tree/main

--> Sqlite has a json extension