Closed Chayatan closed 2 years ago
By adapting the simple
example you get something like this for your usecase:
use std::str::FromStr;
#[derive(Clone, Debug, hdf5::H5Type)]
#[repr(C)]
struct Composite {
u64: u64,
i64: i64,
f64: f64,
bool: bool,
string: hdf5::types::VarLenUnicode,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = hdf5::File::create("output.hdf")?;
let composite = file.new_dataset::<Composite>().create("composite", (3,))?;
composite.write(&ndarray::arr1(&[
Composite {
u64: 0,
i64: 1,
f64: 2.5,
bool: true,
string: hdf5::types::VarLenUnicode::from_str("one")?,
},
Composite {
u64: 1,
i64: -5,
f64: 6.0,
bool: false,
string: hdf5::types::VarLenUnicode::from_str("two")?,
},
Composite {
u64: 2,
i64: 9,
f64: 10.2432,
bool: false,
string: hdf5::types::VarLenUnicode::from_str("three")?,
},
]))?;
Ok(())
}
Hey @mulimoen mulimoen, thanks for sharing this piece of code, this is a good start for me, however in this solution, the struct is hard coded and I want this struct to be decided at runtime.
lets say, I will let the user decide the columns and types information in the beginning and later the user will keep appending the row data accordingly (number of rows is also unknown at compile time).
Any idea or comments on achieving this kind of a scenario is appreciated.
You will need to implement H5Type
for your compound type. You should have a look in hdf5-types/src/h5type.rs
for how to create a CompoundType
@mulimoen Technically, I guess we could add unsafe methods to allow the user to provide TypeDescriptor
and a slice of data of type T: Copy
where mem::size_of::<T> == type_descriptor.size()
, or something like that, for reading and writing (given that Reader/Writer only use H5Type
to extract type descriptor really). Or maybe even not provide a type descriptor at all; so only copyability and sizeof will be checked.
Then the question only remains how to create datasets with dynamic type descriptors - but IIRC I've already added that in feature/dcpl branch (and if not, we can add it).
@aldanor I guess this would be problematic with arrays/strings that can't be Copy
? The example does have a string which would make this problematic. I "solved" this problem in netcdf
by requiring composite types to be read as a binary blob which the user must decode themselves, including freeing memory where applicable. This is not ideal however.
@mulimoen I think this might be a good start even if it's only supported for copyable types, that would already cover many use cases for dynamic compound types. Writing non-copyable types is not a problem, obviously. For reading, I think if you wanted to make it nice, it's possible, there would have to be a wrapper around ndarray::Array of some sort which would know the in-memory type descriptor, it would implement manual Drop which would run over all entries on drop and destroy them, and it would provide a strided view of each field (by name), kind of like a very simplified pandas dataframe of sorts.
@Chayatan
this is a good start for me, however in this solution, the struct is hard coded and I want this struct to be decided at runtime.
Maybe if you could explain to us what exactly you mean by "struct to be decided at runtime" we can be of more help. Rust is not Python, so you can't "decide structs at runtime", your data has to be in some format already.
@aldanor @mulimoen for example, if I were to write an interactive user application, where the user,
The problem in the given solution is that, I dont have the flexibility to change the number of columns or its type info after the code is compiled.
struct Composite {
u64: u64,
i64: i64,
f64: f64,
bool: bool,
string: hdf5::types::VarLenUnicode,
}
what if the user wanted to store a different set of data in the second run? something like just a table of integers and a strings
struct Composite {
u64: u64,
string: hdf5::types::VarLenUnicode,
}
@Chayatan In this case, why would you use a struct? You would probably store your "struct" as an HDF5 group, with each column being a separate dataset.
(or use another storage solution like Arrow/Parquet for pure columnar access in a unified dataset if that's the goal)
storing each column as a separate dataset in a HDF5 group sounds great, could you please share an example how I can create a group using this crate
@Chayatan file.create_group()
, group.new_dataset()
. Basically, File is a Group, it's kind of the same thing, group is like a subfolder.
Resolved (the OP was happy to store the variables/columns in separate datasets).
I am just starting to use the crate and the sample example doesn't seem to cover all the capabilities of the crate. Could anyone please guide me creating a hdf5 file using this crate to have the contents of a composite structure?
Thanks in advance!