dysonance / Temporal.jl

Time series implementation for the Julia language focused on efficiency and flexibility
Other
100 stars 26 forks source link

Why is the `namefix` needed when constructing a TS? #13

Closed dm3 closed 6 years ago

dm3 commented 6 years ago

First of all - thanks for the great package!

I'm just starting with Julia so might be missing some obvious things. One thing that caught me off guard with TS is that the column names are sanitized into alphanumeric-only names. Why is this necessary?

I'd like to have my names with underscores and I haven't seen problems with those in other parts of the Julia ecosystem - can underscores be whitelisted?

dysonance commented 6 years ago

@dm3 Thanks for the kind words and welcome to the world of Julia!

The reason I implemented things this way has to do with the fact that the element type of the column names vector is Symbol, which can be awkward to deal with when using non-alphanumeric characters.

What I did not realize until just now (and thank you for pointing this out) is that Symbols can handle underscores no problem. So this is actually a great feature request that I could probably get implemented pretty easily.

Thanks for the good suggestion, will let you know once I've merged in this feature.

dysonance commented 6 years ago

@dm3 Issue addressed in pull request here: https://github.com/dysonance/Temporal.jl/pull/16

dysonance commented 6 years ago

@dm3 This pull request has been merged into master and a new version is awaiting acceptance into Julia's METADATA repo (see https://github.com/JuliaLang/METADATA.jl/pull/12317).

Once accepted/merged in, this functionality should be available and running Pkg.update() should have you all set to start using underscores in column names, so I'm gonna go ahead and close this issue, just let me know if any issues and can reopen.

Thanks again for the good suggestion. Cheers.

dm3 commented 6 years ago

Great! Thanks for the quick turnaround!

Isn't there a function in the Julia base package already which could be used to validate whether a string is a valid symbol?

dysonance commented 6 years ago

I'm not familiar with one if it exists. The problem isn't so much with that it wouldn't be a valid symbol, more that it would be awkward to print and handle for indexing and so forth from a user perspective. For example, consider the following:

image

It's not so much that it can't handle the Symbol implementation, it's just that it prints awkwardly and that if you wanted to index that column you would have to do X[Symbol("hi.mom")] which is less user-friendly than X[:hi_mom]. So it's more of a design thing than an ability thing.

dm3 commented 6 years ago

Interesting. I'd lean towards preserving user input unless requested otherwise in the API as much as possible. The major assumption behind the API design being "user knows best". I'd make the Xts constructor pass the column names unmodified and provide the sanitize_names function to the user in case they want to use "nice" naming.

However, I'm just getting started with the data/scientific community and one of the things I've noticed is convenience trumps many other concerns :)

Thanks again for the quick update!

dysonance commented 6 years ago

@dm3 I would agree in most cases that it is better to allow the user more control to handle stuff like this on their own. The only reason I'm not quite convinced it's best to do it in this case is that for something like column/field names of a dataset, I think there's a case to be made for enforcing better practices programmatically (in this case meaning making it harder to have characters like . in field names for a data structure).

That being said, if the default column name sanitation isn't to your liking, you can indeed do something like the following:

x = TS(rand(10))
x.fields[1] = Symbol("hi.mom")
println(x)

And you should get what you're after. So, at the end of the day, non-sanitized names are indeed possible, just a little harder to get in there. I personally prefer it that way, but am totally open to discussion and hearing arguments for dropping the name fixing logic. One argument from a less subjective perspective I could see would be performance: if it isn't a much-desired feature, then there isn't much to be gained from doing that check at every TS object construction.

dm3 commented 6 years ago

The only reason I'm not quite convinced it's best to do it in this case is that for something like column/field names of a dataset, I think there's a case to be made for enforcing better practices programmatically (in this case meaning making it harder to have characters like . in field names for a data structure).

I think this is better handled by promoting and improving the official style guide of the Julia language itself, rather than forcing the user into a naming scheme. This comes back to "user knows best". I expect the library to process my input as supplied and fail if it's not possible.

Not sure what would be the right decision on evolving Temporal.jl, but that's my 2 cents :)

dysonance commented 6 years ago

@dm3 Gave your feedback some more thought, and decided you are probably right about not forcing a naming convention on the user. Removing the namefix logic for now, and resubmitting to make the change public. Maybe at some point can add some argument to cleanse the names but for now just going accept user-given fields without modification.

So now you should be able to do something like the following.

julia> TS(randn(100), today()-Day(99):Day(1):today(), Symbol("Hi Mom"))
100x1 Temporal.TS{Float64,Date}: 2017-09-04 to 2017-12-12
Index        Hi Mom
2017-09-04  -0.1706
2017-09-05  -0.5755
2017-09-06   0.2072
2017-09-07  -1.328
2017-09-08   0.6653
2017-09-09   0.1588
2017-09-10   2.5111
2017-09-11  -2.5226
2017-09-12  -0.4761
⋮
2017-12-03  -0.3071
2017-12-04   0.4927
2017-12-05   0.7182
2017-12-06   0.4669
2017-12-07   1.4851
2017-12-08   0.4627
2017-12-09   1.3796
2017-12-10   0.9952
2017-12-11  -1.9579
2017-12-12  -0.1144

Thanks again for the feedback. Cheers.

dm3 commented 6 years ago

Great! Thank you for the change!