Closed gforge closed 8 years ago
Hey Max !
What an amazing job you did ! I am glad that my little piece of code helped you :) Your TestSuite class is wonderful, that is exactly what was missing to this class !
Your proposal are all welcomes and appreciated ;)
Concerning the unique
function counting frequencies, I am not decided if it's better to separate the frequencies count in another function or keep it that way.
About your commits I have some comments :
add_column
function, you could simplify the table test to one condition_to_html
function doesn’t worth it ^^. You have a typo in Dataframe:_update_single_row()
test, you should have df_tests._update_single_row()
. There was the same typo in _update_single_row
that in _extract_row
, I will fix it right away so the test will work. You also have a typo in the assert. You should have a:_extract_row(1),new
instead of a:_extract_row(1,new)
.The update
function browse the entire dataset applying condition function (first argument of update
) to each line. When a line meet the requirements you set in your custom condition function, the line is passed to the second function in update
arguments (update_function arg) and then use _updated_single_row
to update the specific line and so on.
Many thanks again for your contribution :)
Hi Alex!
Great to hear that you like my additions. I was afraid that you'd dropped the project and it will be great to have some input on my additions.
I agree about the unique function, I suggest adding a value_conts - this is intuitive and sticks with the original intention of keeping Pandas design principles. Compared to R's table()
value_counts makes much more sense.
I'm not sure exactly what you mean by the add_column
. As you can see the TestSuite has a learning curve, I found the assertTable a little late (first appears on row 342). Feel free to add/modify whatever you feel like.
I'll add a test for the update()
. One thing that I think should be fixed is the where
, it currently fetches the first match - the Pandas where returns all matching entries. One way would be to have a private _where
that returns the index numbers that the update()
also uses.
I will start issues for the suggested extensions so that we can have a closed discussion regarding each extension's implementation.
In fact, I don't use the lib everyday so I try to improve it when I have time but I am still in the run !
Anyway, thanks a lot for the new test ! :)
About add_column()
I was referring to the file Dataframe.lua
and the following code :
if (default_value ~= 0) then
if (type(default_value) == 'table') then
assert(#default_value == self.n_rows,
'The default values don\'t match the number of rows')
end
default_value = default_value
else
default_value = 0
end
Which could be simplified to :
if (type(default_value) == 'table') then
assert(#default_value == self.n_rows,
'The default values don\'t match the number of rows')
end
I totally agree with your suggestions. I am going to close this PR and open an issue for each modification/bug you mentioned to keep it clean. I let you create the issues about your three proposals to give more details about your vision.
Added the suggested simplification. Will you merge the PR or do you want something more?
It's perfect !
Hi Alex,
I love what you've done with the torch-dataframe. This was exactly the foundation that I was looking for. Since I want to add more functionality to it I've started with adding a TestSuite in order to make sure I don't introduce any changes. In the making I found a few bugs, most important the
to_tensor
didn't work with theipairs
. I've also changed started adding somdok.unpack
in order to make the parameter handling smoother. I'm uncertain of how theupdate
works so I haven't added any check there and I slightly changed theunique
- it now counts the frequencies of variables.My plan is to extend your package with:
Note that the current rockspec points to my repository and you need to change this if you want to use the luarocks installer.
Thanks! I hope you find the additions valuable. Max