Crunch-io / scrunch

Pythonic scripting library for cleaning data in Crunch
GNU Lesser General Public License v3.0
5 stars 7 forks source link

(derived) Editing derived variables/categories #148

Open jamesrkg opened 7 years ago

jamesrkg commented 7 years ago

We'd like a way to edit the derivation expression of a derived variable and/or category, or more specifically to replace it with a new one, e.g.:

# edit a variable derivation (e.g. a derived numeric)
ds[alias].derivation = "q1 / 3"

# edit a category derivation
ds[alias].categories[977].derivation = "q1 in [1, 2, 3"

We'd also like a way to edit a derived category attributes, e.g.:

# edit a derived category
ds[alias].categories[977].edit(name="new name!", numeric_value=None, missing=True)

There appears to be some edge cases where editing numeric_value or missing is problematic because they are 1:1 derived from a source category (does it even make sense in that situation for these to be derived attributes?) but for new categories we should be able to edit anything we like after the fact.

jjdelc commented 7 years ago

The API allows to PATCH the derivation of a derived variable, although we have to check for some restrictions.

Our support may allow for some things to break, for example if you currently have a derivation that returns and array and change the derivation to something totally different. It will break any filters or decks you had pointing to that variable for sure.

Also, if anybody made a derivation with a derived variable as depdencency, the dependent variable is locked and cannot be changed.

For changing category names, it is just a matter to recreate the original derivation expression with the new name and PATCH it in place, this should be supported fine.

mathiasbc commented 7 years ago

Do we want to support all these verifications ? Is there a way to do it in Crunch ?

jamesrkg commented 6 years ago

Can I bump this one please @mathiasbc ? It's not a blocker but I would put it to use straight away if I had it. It seems like there are two parts to this:

It seems from what @jjdelc said that editing a derived expression should be possible (but with an internal re-create?). However, if editing the derived expression is problematic I can live without it for a while yet. Not being able to edit the other attributes (name, numeric_value, missing) of a derived category is the more frustrating of the two.

mathiasbc commented 6 years ago

@xbito: What are you thoughts ?

jjdelc commented 6 years ago

About the described scenarios. The Crunch API does not have specific support to change individual parameters of a derivation expression. You always PATCH an updated derivation. So if we'd like to support changing only the name of a category in a derived variable, it would have to be fully handled by Scrunch.

To change the name of a category, you'd need to go into the right section of the expression that generated it and change that single value and then PATCH that gain.

I think that it would be better code-wise to abstract the complexity of understanding each of the expressions and their functions into some classes that allow easier manipulation.

Something like CaseExpression(crunch_expr) and then you can do something like:

case_expr = CaseExpression({...crunch json expr ...})
case_expr.categories[1]['name'] = "new name"

ds['variable'].derivation = case_expr.expression

That way we can factor out much complexity of array functions, case expressions, copy variables or other expressions out of the Dataset class, Variable class, Category class and such. We'd now have encapsulated complexity into each of the functions we support... I'm sure that Arrays will want something similar.

My main concern is that the suggested syntax will bring heaps of code complexity to the existing classes.

jamesrkg commented 4 years ago

@xbito let's talk about this one again. I was reminded by the team that the simple task of editing the labels for derived variables can still only be done in the UI, which is obviously pretty frustrating for them. Patching the derived expression i can live without but editing name, description, notes should be possible.

jjdelc commented 4 years ago

We recently deployed a function that can aid with this, it's called alter_categories and allows to do the following:

Nothing else because we don't want this to be a schema breaking function.

The benefit of this function is that makes any client (Scrunch) that needs to edit category names on any kind of derived variable does NOT need to know about the internal details of how such category/subvariable name came to happen. All it needs to know is to wrap it with alter_categories and set the desired category names.

When editing a variable, we only need to sense if the current derivation of the variable is alter_categories, if so, the client needs to have knowledge only about this and alter it accordingly to obtain the desired results, if it is anything else then we just wrap it. This is to avoid wrapping multiple times nested levels of alter_categories to do things that can be collapsed in one.

jamesrkg commented 4 years ago

Thanks @jjdelc I'll discuss with @xbito later this week.