RTXI / rtxi

Tutorials, FAQs, and more at http://rtxi.org/docs
GNU General Public License v3.0
53 stars 15 forks source link

Hdf5 to python #108

Closed rythorpe closed 8 years ago

rythorpe commented 8 years ago

We'd like to use python for our data analysis, but can't read the HDF5 version supported by RTXI via pytables and pandas. Could RTXI be updated to output the most recent version of HDF5?

Here is our error report from python:

ValueError: PyTables [3.2.3.1] no longer supports opening multiple files even in read-only mode on this HDF5 version [1.8.4]. You can accept this and not open the same file multiple times at once, upgrade the HDF5 version, or downgrade to PyTables 3.0.0 which allows files to be opened multiple times at once

sudorook commented 8 years ago

Try the h5py library. (sudo apt-get install python-h5py)

sudorook commented 8 years ago

Here's a quick snippet that will read a trial from a recorded hdf file as a pandas dataframe and plot the timeseries for each recorded channel. (only tested for RTXI 2.1, so you may need to tweak it for 2.0):

Edit: I fixed a couple bugs I just noticed. This a dev script I use for my own purposes - I posted it here as a reference to hep you get started. There could be more bugs I haven't found yet.

#! /usr/bin/env python

import h5py as h5
import numpy as np
import pandas as pd
import seaborn as sb
from matplotlib.backends.backend_pdf import PdfPages

filename = "my_hdf_file.h5"
plotname = "my_plot_file.pdf"
trialnum = 1

def getNumTrials(f):
    ntrials = len(f["/"])
    print("# of Trials:\t", ntrials)
    return ntrials

def getTrial(f, n):
    trialname = "/Trial" + str(n)
    return f[trialname]

def printChannelNames(f, n):
    meta = [ item for item in f["/Trial" + str(n) + "/Synchronous Data"] ]
    headers = [ item.split(" : ")[1] for item in meta[0:len(meta)-1] ]
    idx = 0
    for header in headers:
        print str(idx) + ". " + str(header)
        idx+=1
    return

def getPeriod(f, n):
    return f["/Trial" + str(n) + "/Period (ns)"].value

def getDownsamplingRate(f, n):
    return f["/Trial" + str(n) + "/Downsampling Rate"].value

def getTrialLength(f, n):
    return f["/Trial" + str(n) + "/Trial Length (ns)"].value

def getChannelFrame(f, n):
    meta = [ item for item in f["/Trial" + str(n) + "/Synchronous Data"] ]
    headers = [ item.split(" : ")[1] for item in meta[0:len(meta)-1] ]
    data = f["/Trial" + str(n) + "/Synchronous Data/" + str(meta[len(meta)-1])]
    frame = pd.DataFrame(data=np.vstack(data), columns=headers)
    return frame

hdf = h5.File(filename, 'r')

getNumTrials(hdf)

period = getPeriod(hdf, trialnum) # in ns
downsampling = getDownsamplingRate(hdf, trialnum)
length = getTrialLength(hdf, trialnum) # in ns

frame = getChannelFrame(hdf, trialnum)
frame["Time (ms)"] = np.arange(0, len(frame)) * float(period) / 1e9 * 1e3 # to ms
meltyframe = pd.melt(frame, "Time (ms)")

hdf.close()

# gennerate plots
with PdfPages(plotname) as pdf:

    for field in meltyframe['variable'].unique():
        p = sb.lmplot(x="Time (ms)", y=field, aspect=2, size=5, data=frame, fit_reg=False)
        pdf.savefig(p.fig)