edmundhighcock / hdf5

A ruby wrapper for the HDF5 library
MIT License
6 stars 4 forks source link

Can I use this library to create a new HDF5 file? #2

Closed retorquere closed 2 years ago

edmundhighcock commented 8 years ago

Hi,

Unfortunately not; this is a work in progress, and I only need to read HDF5 files. However, if you look at the source code, you'll see it's very short and simple, and if you are familiar with HDF5 it should be a doddle to add what you need. Alternatively, if you can send me the sequence of HDF5 calls you would need I could have a go adding them.

retorquere commented 8 years ago

I honestly don't know the sequence yet -- I need to get a table of values into matlab, and I'm still trying to figure out what the best format for that is.

edmundhighcock commented 8 years ago

Well if it is just a table of floating point numbers, it should very easy to implement.

edmundhighcock commented 8 years ago

Can you give an example of what you are trying to write?

retorquere commented 8 years ago

data.txt

It's a CSV file but github doesn't want the .csv extension

retorquere commented 8 years ago

I could strip it down to floats, there's currently a few strings in there but those are not really important.

edmundhighcock commented 8 years ago

It would be pretty easy to do something simple, like have them all in separate named arrays: 'time', 'Inputs.brakePedal', etc. You could then read them into matlab and sort it out.

edmundhighcock commented 8 years ago

I should also point out that exactly the funtionality you need is already in R:

https://www.getdatajoy.com/learn/Read_and_Write_HDF5_from_R

edmundhighcock commented 8 years ago

(What you have sent is basically a data frame, and R is designed to deal with exactly your kind of data).

edmundhighcock commented 8 years ago

However, I'm happy to have a go to get you a working example over the next couple of days.

retorquere commented 8 years ago

Doesn't HDF5 support mixed-mode matrices? The data as I have it doesn't fit into a data frame? I'd love to have a sample -- not really familiar with R.

edmundhighcock commented 8 years ago

Neither am I, I'm afraid: if you choose the R route you're on your own! We could create an object in Ruby, which has member objects like "Inputs", "Weather", each of which has members of actual data, and then create a similar object in the HDF5 file. That would be the most intuitive way of doing it, but would require slightly more coding than simple arrays.

retorquere commented 8 years ago

I don't mind more coding on my end. But can HDF5 not store a matrix like in that CSV file directly? Only arrays?

edmundhighcock commented 8 years ago

HDF5 is incredibly flexible and can store pretty much anything (depending on how much work you want to do). However, storing your data as a matrix would be counter to the HDF5 philosophy, because it contains mixed data types. HDF5 (like Ruby, but unlike matlab) uses an object oriented approach. In your case, the object is a single record in your database (i.e. a single row in your matrix). You would want to define an object that matches your record, and then define an array of those objects. This would be the 'HDF5' way of doing things.

retorquere commented 8 years ago

That sounds perfect for my needs in fact. The object would basically look like this:

class Weather
  attr_reader :precipitation, :windDirection #, ....
end
class Inputs
  attr_reader :shiftModeE, :shiftModeD #, ...
end
class Measurement
  def initialize
    @inputs = Inputs.new
    @weather = Weather.new
  end
  attr_reader :inputs, :weather, :time
end
edmundhighcock commented 8 years ago

Exactly. Why don't you write a Ruby package to read your datafile into an object, then I'll help with writing it to HDF5.

retorquere commented 8 years ago

I've put up my scrip at https://gist.github.com/1fed4d2a66f275aa596d5f6e0c5fd25b . The data structures are definded between lines 42 and 56, and the measurement is filled between 87 and 97. Each measurement is for a specific car (the "license" variable), but I'm hoping I won't have to load everything into a big hash before it can be saved to HDF5 -- the total structures can get quite big, so I'd prefer to just append data as I find it. Each car would have its measurements in its own dataframe if I get the parlance right.