This is an umbrella issues to make dplyr-spark tables more data-frameish. Standard procedure should be to open an issue for each of the specific points and mention this one.
2 sampling
3 slicing
nrow. Returns NA instead of the actual count, motivation being that
summary. No summary in dplyr, actually treats a table as a list. Sad
create from file. Like a read.table or some such. Maybe an extension to copy_to, based on LOAD INPATH
dropping of rownames in copy_to. dplyr boycotts rownames (I understand that) but I'd prefer creating a col rather than dropping the information altogether. The party line is: don't use rownames, use a col. Well, we should lead by example and copy rownames to a col
This is an umbrella issues to make dplyr-spark tables more data-frameish. Standard procedure should be to open an issue for each of the specific points and mention this one.
2 sampling
3 slicing