Gmousse / dataframe-js

No Maintenance Intended
https://gmousse.gitbooks.io/dataframe-js/
MIT License
460 stars 38 forks source link

[FEATURE] Implicit column for dataframe API #86

Open mbkupfer opened 5 years ago

mbkupfer commented 5 years ago

Is your feature request related to a problem? Please describe. I find it very redundant and unnecessary to repeat typing a column reference for api functions that act on a single column dataframe.

Describe the solution you'd like Implicitly determine the column for functions that are applied to dataframes with only one column. This could easily be done by checking length of [...this[__columns__]]. This would be particularly useful for df.toArray() since currently using it on a single column returns an array of single element arrays which I can't see any use case for. That said, I believe many other functions can be served from this feature and it will also reduce a lot of repetition.

Describe alternatives you've considered Perhaps a new Series object similar to pandas library.

Additional context N/A

Gmousse commented 5 years ago

Hi that's a good point. I will see what could be done ! Thank you for your suggestion

mbkupfer commented 5 years ago

Thank you for the reply and interest in this issue, @Gmoussee!

Since I last posted, I have come up with a more interesting use case: this feature would make introspection of variables a much more pleasant experience Take the below two examples. One is pandas, other is dataframe-js

# pandas

df[column].unique()
// dataframe-js

df.select(column).distinct(column).show(column)

You see, many of us come from a pandas background so we get confused about the verbosity that we need to use when composing our functions.

Gmousse commented 5 years ago

Yep that's clear. I m working on a new version on an experimental branch. I will make some tries.

mbkupfer commented 5 years ago

@Gmousse, have you made any progress on this? I keep running into situations where this redundancy comes up so it would be a really nice feature to have. If this is an issue about time, then let me know and I'd be happy to submit a PR 😃

Gmousse commented 5 years ago

Hi, sorry I was a bit busy these days. I must work on the api proposal for this feature.

Gmousse commented 5 years ago

Hi @mbkupfer, I m currently working on it, I will submit (in this issue) a proposal about the api.

tony-bony commented 4 years ago

Hi, It would be also nice if the df.drop('column2') could accept an array of column names instead of a single column name. Can this be included in new release?