MideTechnology / endaq-python

A comprehensive, user-centric Python API for working with enDAQ data and devices
MIT License
25 stars 12 forks source link

Summarizing a Batch Object Output #66

Open S-Hanly opened 2 years ago

S-Hanly commented 2 years ago

I just processed 5000+ files and generated a dataframe of the metrics and the psd which equate to a csv when saved of 31 MB and 605 MB.

Plotting that full thing will be silly and not work, it would be good to on the psd and/or pvss side to compute some quintiles per frequency bin (requires some rounding) and on the metrics side maybe borrow some of the work done in endaq.plot to allow for plotting the max/min of the metrics over time (I may get away with using the existing functions).

@SamRagusa @CrepeGoat

S-Hanly commented 2 years ago

This wasn't a monumental effort but it would still be nice to wrap into something in batch and/or plot.

  1. First I rounded the frequency bins
  2. then did a pivot focusing on one axis
  3. created an animation of the psd, but skipping every 10th instance because plotly struggled with too many
  4. then calculated the max, min, median and average
  5. added the above to the animation plot

The plotting isn't necessary, in order to do this summary all you really have to do is the rounding of frequency bins, the df.pivot then using basic operations within pandas (df.quantile() or df.mean()) so this may best accomplished with examples?

psd['frequency (Hz)'] = np.round(psd['frequency (Hz)'] ,0) 
psd_z = psd[psd.axis=='Z (40g)'].pivot(index='frequency (Hz)', columns='start time', values='value')

fig = px.line(
    psd_z[psd_z.columns[::10]].reset_index().melt(id_vars='frequency (Hz)'),
    x='frequency (Hz)',
    y='value',
    animation_frame='start time'
)

def add_line(df_stat,name,dash,color):
  fig.add_trace(go.Scatter(
    x=df_stat.index,
    y=df_stat.values,
    name=name,
    line_width=3,
    line_dash=dash,
    line_color=color
))

#Add max, min, median
for stat,dash,quant in zip(['Max','Min','Median'],
                           ['dash','dash','dot'],
                           [1.0,0.0,0.5]):
  df_stat = psd_z.quantile(quant, axis=1)
  add_line(df_stat,stat,dash,'#6914F0')

#Add in mean
df_stat = psd_z.mean(axis=1)
add_line(df_stat,'Mean','dot','#2DB473')  

fig.show()

newplot - 2021-11-23T203940 500