Open sinhrks opened 7 years ago
I think that #126 and the proposal here are a good starting point - if we can't reach a consensus then the PR would make a strong contribution on it's own.
Now onto more direct discussion. One concern of this approach would be that this could lead to a very bloated API in the future if we want to support many data formats. Is there a way we can have a single read_from_file
function, which takes as argument some abstraction we give over many data readers?
For now I think we should put off this approach until we know a little more about what we're dealing with. My vote is to tackle an approach at least similar to the one given here.
I think the PR is a great start. I have some comments and concerns which I will post soon, most likely during next week. Currently rather busy!
Some concerns:
Matrix
API will be very bloated if we add IO-related methods directly to its API.csv
feature, and a hdf5
feature.Based on these concerns, I'd like to make the following suggestions:
Create an io
module within rulinalg
(without feature-gating), which contains the generic IO documentation, explaining the overall philosophy and how to enable various IO functionality. Enabling various IO features would then enable submodules within io
, i.e. rulinalg::io::csv
. Here you might have free functions such as csv::read_matrix(...)
or types such as csv::MatrixReader
, depending on whatever is appropriate. In any case I think they should be external to the Matrix
API.
I think it's hard to come up with generic traits for IO, because they might have very different requirements. For example, a CSV file is very simple and so doesn't need much configuration, whereas more complicated storage formats might require far more configuration options. Although it would be desirable to have a unified API for different formats, it might be more practical to simply let each storage format involve separately within the io
module for now, and then we can try to unify them later.
I'm coming back to look at some of these issues and PRs again now - hopefully I can start getting things resolved!
I think @Andlon makes a good point - we don't want our users to have to install a bunch of io-stuff to access small parts of that functionality (HDF5 vs csv as the given example). Though I'm not convinced that feature gating each component is the best approach, as this could potentially get messy. You also pointed out that generic IO traits aren't a silver bullet - even if it were doable (and/or pretty).
My vote for now would be to merge #126 as-is and see how this story evolves. This issue can stay open and be a reference point if/when this discussion receives more attention. Thoughs @Andlon @sinhrks ?
(Also thank you both for your patience)
Yeah, currently we can define io
to install all io related sub-features. We may split it when we add others in future, but io
should be still usable to install all of them. So no problem from user's point of view.
I'm OK with it either way! As you say, I think #126 is a good start for the IO story. The only thing I'm not a huge fan of is adding yet more methods to the Matrix
API, but for now this is bearable.
Currently we have no API for
Matrix
to be input / output to external sources. The issue intends to fix a standard IO API related to various sources.I've prepared a PR #126 for CSV IO for discussion. Current specs are:
io
to support CSV IO.Matrix
now hasread_csv
(static) andwrite_csv
to read / write CSVs.csv
crate, which is optional dependency whenio
feature is enabled.For other sources can be added with:
read_xxx
andwrite_xxx
toMatrix
appropriately.Would like to ask whether the above direction looks OK.