bkamins / Julia-DataFrames-Tutorial

A tutorial on Julia DataFrames package
MIT License
531 stars 119 forks source link

Minimum interface to define a DataFrame-like type #3

Closed juliohm closed 6 years ago

juliohm commented 6 years ago

It would be great to learn more about the minimum interface expected to be implemented by subtypes of AbstractDataFrame in one tutorial notebook. Do you think it makes sense to have it here?

bkamins commented 6 years ago

This is probably too complex for a tutorial and might change in near future. If you want to have a peek here is a current thread related to a similar issue https://github.com/JuliaData/DataFrames.jl/issues/1335 and referenced there example implementation of new subtype of AbstractDataFrame which is TypedDataFrame (https://github.com/JuliaData/DataFrames.jl/compare/nl/typed). It has almost 1000 lines of code.

I can keep this Issue open as maybe one day we will have this interface stabilized enough to specify it (but feel free to close it if it is OK for you to switch with discussion to the thread I mention here).

nalimilan commented 6 years ago

Sounds like something which should be documented in the DataFrames manual. But indeed, better stabilize it before working on the docs.

juliohm commented 6 years ago

@bkamins you mean 1000 lines to define the interface? o.O

I agree that the DataFrames.jl docs is more appropriate for defining the interface, but since I couldn't find it there, I thought this repo would get it done more quickly. I encountered this necessity to define dataframe-like objects twice in my packages, but couldn't get it done.

bkamins commented 6 years ago

Let me write down here a tentative API a subtype of AbstractDataFrame is expected to implement (as of now - this will for sure change):

And there are functions that are not part of AbstractDataFrame API, but are defined for DataFrame:

juliohm commented 5 years ago

Is this API documented somewhere already? How does it relate to the Tables.jl API?

bkamins commented 5 years ago

It has not been written down unfortunately. Tables.jl is a more general and simple API that is satisfied by DataFrames.jl in particular. You can check in /other/tables.jl file what methods need to be defined (some methods are already there for AbstractDataFrame some are specific for DataFrame and would have to be extened).