IntelPython / bearysta

Pandas-based statistics aggregation tool
Apache License 2.0
3 stars 4 forks source link

[Aggregator] add rename_col_values parameter #19

Closed amyskov closed 1 month ago

bibikar commented 3 years ago

This sort of thing is one reason that precompute columns were made in the first place. It should be possible to accomplish the same thing with precompute, unless it is a concern that

  1. the precompute lambda functions can get a bit "ugly" and hard to read, or
  2. it is not easy or straightforward to define the replacements outside the precompute lambda function.

If neither of these are a concern right now, you could probably work around this problem with something like the following used as the function for a precompute operation which looks at my_column and replaces a with b, c with d, and e with f.

lambda row: {'a': 'b', 'c': 'd', 'e': 'f'}.get(row['my_column'], row['my_column'])

It is also possible that use-cases exist where precomputed columns could be needed before certain filter operations. In these cases, the fixed pipeline is not sufficient, and maybe it would be worth investigating trying to abandon the fixed pipeline. (I think this is what is discussed in #14.)

anton-malakhov commented 3 years ago

right, let's think how precomputed can be used in a non-ugly way, especially if there are thousands of values to replace . E.g. add read_csv function into the list of pre-defined functions for lambdas