Open drwebb opened 9 years ago
I've added a demonstration of how to do what you want, if I understand correctly.
If you can figure out a way to package it to address what you think is the most common use case, we can add a helper to do just that. We can spitball ideas here, or you can open a PR if you think you've got a good handle on it yourself.
Very good! I've managed to absorb this into my code, which I'm using to help analyze the results of the commercial Haskell survey. Working with strongly typed columns makes the experience much better that it could be otherwise.
Is the First functor necessary here to mappend the columns together? I'm not sure if it was done for illustrative purposes, and if you could sidestep it. In my case I had to rmap First
over the Rec Maybe cs
to get it in the form to work with your code. Also, while looking up the docs I noticed that First isn't an instance of Applicative
or Functor
in GHC 7.8, which isn't so nice.
In terms of implementation, it probably makes sense to add another function like readTableWithDefault
which would do the transformation you put forth in your demonstration and be a type like Producer Record IO ()
.
In my code I have a lot of default instances for the column types i.e:
type Occupation = "Occupation" :-> Int -- This is done by the template haskell
instance Default Occupation = def -- I have to write this manually currently
It would be nice to also generate the Default
instances for the column types that are created by the template Haskell, which should be pretty straight forward if the inhabited type has an instance. Frames support for user defined types makes me think it should all be optional.
I'd be happy to open a pull request, but would like your input on these implementation details.
I like the idea of having a readTableWithDefault
function. If we provide this, then the First
issues become a library concern, so however we do the monoidal combination, it's our business and won't hurt anyone else. I used First
for clarity, but it wasn't a significant choice.
I'm slightly conflicted about providing the Default
instances, but I can't say my feelings on it are informed by any experience. When looking at this issue, it struck me that it might be rather useful to be able to provide different Default
instances for different columns that may have the same actual data type. For instance, one column might default an Int
to 0
, while another defaults an Int
to 1
.
On the other hand, some data sets produce a lot of column declarations, so automating things would be nice. Is there some way we could get the best of both worlds? We should control Default
instance generation with an option, but it would be great if we could selectively avoid generation. One way of doing this would be to have an option that controls Default
instance generation by taking a list of column names to not generate instances for.
How does that sound to you?
That sounds very reasonable to me. It would be nice to have the ability to override the default for a column in cases where it makes sense. In my case it's certainly helpful the way the file happens to be encoded.
I was talking with my colleague about the whole subject of strongly typed data exploration, like this library offers. With real world data, it's going to be filled with lots of wildcard values which you want to take care of, and this example here just shows how the type system really forces you to pay attention through the use of Maybe values. I feel a good direction for this library is to easily get lots of different types of data into a proper Record
type so you have your data in a highly composable form, while having the strong guarentees of the type system.
Do you think you can take on the generation of those Default
instances? It should slot into the CSV
module with the other TH.
Btw, you might also want to take a look at #27 for a related issue and its resolution. It doesn't impact this issue directly, but it's another facet of dealing with missing data.
I'll take a stab, low priority at the moment but something I can get to in the next couple days.
Upping this to a higher priority, will plan to send you a PR in the next couple days.
So I tried last week to make the Template Haskell changes but failed valiently at my attempt to learn Template Haskell in the process. This is going to require some greater studying on my part, but I was just looking to insert the `instance Default
Okay, I think it will slot into mkColPDec
, but I don't know how to do it off the top of my head. I'll see if I can get to it at some point, but I'm not sure it will be this week.
what if users can only use specify monoidal types for record elements? Then the default is mempty. Sum and Product newtypes can be used as a simple way to specify defaults for Ints for example
Great library, and serving as my introduction to Vinyl. I want to parse some CSV files with missing values, so I'm dealing with Maybe types which I want to convert into a fully populated type by using some default values kind of like so.
As I hope is clear I want to set up a pipe lines sort of like
Thank your work on this library, it's been very cool to experiment with. This does seem like a really common case though, and hopefully you can add some functionality to cover this.