AtheMathmo / rulinalg

A linear algebra library written in Rust
https://crates.io/crates/rulinalg
MIT License
287 stars 58 forks source link

Decide File IO / Serialization API #135

Open sinhrks opened 7 years ago

sinhrks commented 7 years ago

Currently we have no API for Matrix to be input / output to external sources. The issue intends to fix a standard IO API related to various sources.

I've prepared a PR #126 for CSV IO for discussion. Current specs are:

For other sources can be added with:

Would like to ask whether the above direction looks OK.

AtheMathmo commented 7 years ago

I think that #126 and the proposal here are a good starting point - if we can't reach a consensus then the PR would make a strong contribution on it's own.

Now onto more direct discussion. One concern of this approach would be that this could lead to a very bloated API in the future if we want to support many data formats. Is there a way we can have a single read_from_file function, which takes as argument some abstraction we give over many data readers?

For now I think we should put off this approach until we know a little more about what we're dealing with. My vote is to tackle an approach at least similar to the one given here.

Andlon commented 7 years ago

I think the PR is a great start. I have some comments and concerns which I will post soon, most likely during next week. Currently rather busy!

Andlon commented 7 years ago

Some concerns:

Based on these concerns, I'd like to make the following suggestions:

Create an io module within rulinalg (without feature-gating), which contains the generic IO documentation, explaining the overall philosophy and how to enable various IO functionality. Enabling various IO features would then enable submodules within io, i.e. rulinalg::io::csv. Here you might have free functions such as csv::read_matrix(...) or types such as csv::MatrixReader, depending on whatever is appropriate. In any case I think they should be external to the Matrix API.

I think it's hard to come up with generic traits for IO, because they might have very different requirements. For example, a CSV file is very simple and so doesn't need much configuration, whereas more complicated storage formats might require far more configuration options. Although it would be desirable to have a unified API for different formats, it might be more practical to simply let each storage format involve separately within the io module for now, and then we can try to unify them later.

AtheMathmo commented 7 years ago

I'm coming back to look at some of these issues and PRs again now - hopefully I can start getting things resolved!

I think @Andlon makes a good point - we don't want our users to have to install a bunch of io-stuff to access small parts of that functionality (HDF5 vs csv as the given example). Though I'm not convinced that feature gating each component is the best approach, as this could potentially get messy. You also pointed out that generic IO traits aren't a silver bullet - even if it were doable (and/or pretty).

My vote for now would be to merge #126 as-is and see how this story evolves. This issue can stay open and be a reference point if/when this discussion receives more attention. Thoughs @Andlon @sinhrks ?

(Also thank you both for your patience)

sinhrks commented 7 years ago

Yeah, currently we can define io to install all io related sub-features. We may split it when we add others in future, but io should be still usable to install all of them. So no problem from user's point of view.

Andlon commented 7 years ago

I'm OK with it either way! As you say, I think #126 is a good start for the IO story. The only thing I'm not a huge fan of is adding yet more methods to the Matrix API, but for now this is bearable.