gonum / matrix

Matrix packages for the Go language [DEPRECATED]
446 stars 51 forks source link

Added MarshalText and UnmarshalText #341

Closed atedja closed 8 years ago

jonlawlor commented 8 years ago

A couple things:

First, I'd like there to be a test that the output of textmarshal is as expected. I'd also like the same data to be run through the unmarshaller to ensure the data is ok round trip.

Also, the tests need edge cases. What if you marshal a matrix with a different stride? What if you marshal a, NaN, or inf? What if there is a problem with the input csv, like having no data, or uneven rows, or NaN or non numeric data?

Also bigger picture I am not sure that we should just have a bare textmarshal here. We might want textmarshal to handle tab delimited, and html tables as well.

But also: welcome to gonum!

atedja commented 8 years ago

Thank you for the welcome.

encoding.TextMarshaller interface doesn't provide enough parameters to provide custom delimiters like tabs or spaces. If we want more marshalling options, we would need to move away from the golang's encoding interface and have our own MatrixMarshaller.

I'll add more tests and edge cases.

What do you want to do with this pull request? Should I close it? Keep it open?

jonlawlor commented 8 years ago

There are two ways to get different kinds of delimiters: you can have type aliases which wrap Dense, and which just implement csv or tab delimted. Alternatively, you can go the same route as encoding/csv, and have a struct which can hold the details of your specific needs.

Also now that I think about it a little bit more, I would suggest having a CSVReader and CSVWriter type very similar to the encoding/csv.Reader type, with one field a Matrix interface. Then I would have NewReader and NewWriter functions to create the associated reader/writers with sensible defaults, like using commas. Then I would use a type switch in the MarshalText and UnmarshalText methods to deal with specific types of matrices.

I would leave this open for now to have a discussion. What do you think @btracey ?

btracey commented 8 years ago

I have been thinking for a while we should have a "numcsv" package that is encoding/csv but explicitly for matrix data.

Previous Discussion: https://groups.google.com/forum/#!searchin/gonum-dev/numcsv/gonum-dev/gzR7UFXRrUI/FkP0pIRC6WEJ Implementation (but not fully correct): https://github.com/btracey/numcsv

jonlawlor commented 8 years ago

Do you think we only need to implement a reader and writer for Dense, or do we want readers and writers for the other matrix types? If the primary purpose is interop with MATLAB etc. then that would probably be fine.

Also I think I kind of derailed that discussion, sorry.

btracey commented 8 years ago

I should have been more clear. I don't think we should half-implement a CSV package here. I think we should instead have a dedicated CSV package that can do what this PR does (unmarshal and marshel from 'canonical' CSV), but that can also help with interop, i.e. read fortran outputs, and can allow for headings on columns.

Does that sound reasonable to you @jonlawlor @atedja ?

sbinet commented 8 years ago

+1 for a gonum oriented CSV toolbelt package. FYI, I have a few tools of my own:

jonlawlor commented 8 years ago

Yes I think that would be a good idea.

sbinet commented 8 years ago

(I have also been toying this evening with the numpy i/o file format: https://github.com/sbinet/npyio)

atedja commented 8 years ago

I will have to look deeper in the way how Matrix has been developed to make better suggestions since I am pretty new to this codebase. So apologize if this deviates from the direction already set.

@btracey @jonlawlor Would it make more sense to use Mutable instead? So it will only use At(), Set(), and Dims() as part of the marshalling process, thus possibly opening this to Vector and others not just Dense.

It seems to make sense to have a marshal layer if we want more interop supports for reading different csv delimiters coming from different sources. What do you think if we make this layer to be the layer that implements the golang encoding interfaces, so it's extensible to support other formats such as CSV, XML, JSON.

For example (just throwing out ideas here):

mm := &MatrixMarshaler{ Mutable: myMatrix }
mm.SetCSVDelimiter(",")
csv, err := mm.MarshalCSV()
json, err := mm.MarshalJSON()
binary, err := mm.MarshalBinary()

mm.Mutable = myMatrix
mm.UnmarshalJSON(jsonData)  // stores to myMatrix

edit: actually nvm, just found out Vector does not implement At(). Would be nice if it does though.