Closed juliohm closed 6 years ago
This is probably too complex for a tutorial and might change in near future.
If you want to have a peek here is a current thread related to a similar issue https://github.com/JuliaData/DataFrames.jl/issues/1335 and referenced there example implementation of new subtype of AbstractDataFrame
which is TypedDataFrame
(https://github.com/JuliaData/DataFrames.jl/compare/nl/typed). It has almost 1000 lines of code.
I can keep this Issue open as maybe one day we will have this interface stabilized enough to specify it (but feel free to close it if it is OK for you to switch with discussion to the thread I mention here).
Sounds like something which should be documented in the DataFrames manual. But indeed, better stabilize it before working on the docs.
@bkamins you mean 1000 lines to define the interface? o.O
I agree that the DataFrames.jl docs is more appropriate for defining the interface, but since I couldn't find it there, I thought this repo would get it done more quickly. I encountered this necessity to define dataframe-like objects twice in my packages, but couldn't get it done.
Let me write down here a tentative API a subtype of AbstractDataFrame
is expected to implement (as of now - this will for sure change):
getindex
:
Bool
, Colon
Bool
, Colon
index
: returning type Index
copy
, similar
nrow
, ncol
convert
to Matrix
hcat!
_vcat
join
methodsAnd there are functions that are not part of AbstractDataFrame
API, but are defined for DataFrame
:
push!
append
categorical!
allowmissing!
deleterows!
delete!
merge!
insert!
empty!
setindex!
Is this API documented somewhere already? How does it relate to the Tables.jl API?
It has not been written down unfortunately.
Tables.jl is a more general and simple API that is satisfied by DataFrames.jl in particular. You can check in /other/tables.jl
file what methods need to be defined (some methods are already there for AbstractDataFrame
some are specific for DataFrame
and would have to be extened).
It would be great to learn more about the minimum interface expected to be implemented by subtypes of AbstractDataFrame in one tutorial notebook. Do you think it makes sense to have it here?