Closed nignatiadis closed 10 years ago
What's the use case? Why not just add 3 new datasets?
It would simply feel more natural, since the 3 datasets capture exactly the same information (this is why the authors also combine them in a list).The same measurements are done for 3 animals of the same species using the same methods. Here is an example of such a list with 4 datasets:
> data(porpoise)
> sapply(porpoise, function(x) attr(x,"id"))
[1] "GUS" "David" "Mitchell" "Eric"
Thus it would feel more natural to append an :id column and vcat the dataframes, rather than create 4 dataframes (porpoise_GUS, porpoise_David,...). The use case is that you pretty much always would want to analyze those datasets together. Loading them all together as a vector/list of DataFrames as done in the R package would not be possible with the current RDatasets API.
Then again, decreasing the consistency across the RDatasets package for such a limited use case would also be very bad, which is why I am asking.
Then again, decreasing the consistency across the RDatasets package for such a limited use case would also be very bad, which is why I am asking.
Yeah, I'm not sure this is the way to go. What you're suggesting would make RDatasets type-unstable, which is problematic in many different ways. The only solution that would be reasonable would be to always return a Dict, which would be overkill for every other dataset.
In general, it's really hard to look to R for inspiration for Julia since R frequently uses functions that aren't type-stable. R functions likes data
also violate scope rules, which Julia will not allow.
Well, appending the :id column and merging as one data frame would not make it type-unstable, but certainly less consistent.
And yeah, I guess this is one of the really great things about Julia. Thanks for the reply!
I'd be happy to share these datasets as a merged whole with an additional ID column if you think enough people will want that.
Well, I am not really sure how many people would want that. But.. there's at least 1 person :p who would. And the relevant publication (mentioned in CRAN) apparently has been cited 481 times.
I'll try to send the pull request sometime tonight.
Thanks again :).
Closed by #23
Hi! I'd like to ask, if there are general guidelines in regards to the datasets that can be added to this repository. In particular:
1) A package implements its own class. An object of this class basically consists of some metadata and a dataframe. Of the included example datasets, I just want to add the corresponding dataframes (and not the metadata) to RDatasets.jl.
2) Using data("dataname") returns a list of 3 similar dataframes. Instead, I vertically merge those 3 dataframes and add an extra column to distinguish them.
Would such datasets be welcome, or should I refrain from adding them in such a form? And if I add them, how should the "modifications" be annotated?
(The package in question is adehabitatLT.)