acowley / Frames

Data frames for tabular data.
Other
297 stars 41 forks source link

Add LocalTime to CommonColumns #56

Open codygman opened 8 years ago

codygman commented 8 years ago

I put significant thought into how to make this General enough to be useful for most, but can't post in full right now. Have you thought through adding support for fuzzy parsing date like text objects? And things like time zone selection?

acowley commented 8 years ago

The type guessing system has a bit of fuzziness in it -- the Parsed type -- but what I find slightly awkward about it is the need for an ordering to ultimately resolve any ambiguity. I think you could slot time zone and date parsing into a column universe, and I'd be happy to include them in the default column universe.

For reference, we try to parse each entry at each column type. We intersect possible types over a sampling of rows, then for each column we pick the most specific type that can represent each row we looked at, where "most specific" is indicated by the order in which the column types are given.

I'd be open to adding various structured data people think is common to the default set. The tutorial shows how to add your own into the mix, so we're not setting anything in stone.

codygman commented 8 years ago

I noticed the tutorial uses a readable instance and not a parsable instance. Readable happens before parseable because parseable users or.

That zip code example never uses parsable right?

On Sep 1, 2016 10:46 AM, "Anthony Cowley" notifications@github.com wrote:

The type guessing system has a bit of fuzziness in it -- the Parsed type -- but what I find slightly awkward about it is the need for an ordering to ultimately resolve any ambiguity. I think you could slot time zone and date parsing into a column universe, and I'd be happy to include them in the default column universe.

For reference, we try to parse each entry at each column type. We intersect possible types over a sampling of rows, then for each column we pick the most specific type that can represent each row we looked at, where "most specific" is indicated by the order in which the column types are given https://github.com/acowley/Frames/blob/master/src/Frames/ColumnUniverse.hs#L149 .

I'd be open to adding various structured data people think is common to the default set. The tutorial shows how to add your own into the mix, so we're not setting anything in stone.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/56#issuecomment-244122128, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyN3InzSzRlAnzRUVFKx0dfh1QgXnpks5qlvNLgaJpZM4Jyzbc .

acowley commented 8 years ago

Right. It doesn't use Parseable, but it manages to distinguish zip codes from columns with type Int (eg, age). For dates, one does the same thing by prepending the tighter types to the list of column types. Maybe the Possibly constructor isn't super useful.

codygman commented 8 years ago

The thoughts I had:

Sample columns, if any has time zone information take that. If time zone information is specified (would require updating tableTypes' I think) that time zone is used. If no time zone is found or specified, local time is used.

As far as how sophisticated the fuzzy date functionality needs to be... just "%F %C" and a few variations thereof fits my use case. I also had a silly thought about generating some number of combinations of a few more popular formats and then just trying all of them to see if any match.

For the time being though, I'd be quite happy just being able to parse one time format ;)