AlexMili / torch-dataframe

Utility class to manipulate dataset from CSV file
MIT License
67 stars 8 forks source link

Adding metatable functions #18

Closed gforge closed 8 years ago

gforge commented 8 years ago

I think it would be nice to add some torch class metatable functionality in addition to the __tostring__. The available methods according to the docs are:

> for k, v in pairs(torch.getmetatable('torch.CharStorage')) do print(k, v) end

__index__       function: 0x1a4ba80
__typename      torch.CharStorage
write           function: 0x1a49cc0
__tostring__    function: 0x1a586e0
__newindex__    function: 0x1a4ba40
string          function: 0x1a4d860
__version       1
read            function: 0x1a4d840
copy            function: 0x1a49c80
__len__         function: 0x1a37440
fill            function: 0x1a375c0
resize          function: 0x1a37580
__index         table: 0x1a4a080
size            function: 0x1a4ba20

__index__

Single number

Calling __index__ with a number should naturally return a single row. The return object should be a dataframe and either we change the current get_row to use the _create_subset or we call that directly.

A table or tensor with integers

This should probably work just as the _create_subset

A string value

I'm not sure if it is a good idea but a string value could result in get_column since columns are always strings.

__newindex__

Single number

A simple call to _update_single_row after asserting that the index exists.

If index is self.n_rows + 1 then insert should be invoked.

String

If column exists it should drop the column if argument is nil otherwise throw an error.

If string is non-existant it should call the add_column

__len__

Returns the self.n_rows

size

Should return shape()["rows"] when called with size(1) and shape()["cols"] when called with size(2). Empty string returns unnamed table with {no_rows, no_cols}, possibly it should return a tensor for sticking with the Torch spirit

copy

A call to clone and _copy_meta should do it.

AlexMili commented 8 years ago

Cool idea !

Though I am not sure about returning a Dataframe for a single row and with a string as argument. Generally when I use __index equivalent in other languages is to directly have the raw data.

About size returning a tensor seems the best option, there is still the shape function anyway.

AlexMili commented 8 years ago

__eq meta function could be useful for tests purpose.