Crunch-io / scrunch

Pythonic scripting library for cleaning data in Crunch
GNU Lesser General Public License v3.0
5 stars 7 forks source link

`create_variable()` must include aliases for subreferences in payload #373

Closed alextanski closed 5 years ago

alextanski commented 5 years ago

Currently, when creating a variable that is built from subvariables (i.e. categorical arrays and multiple responses), the payload only requires that their name attribute to be provided:

if subvariables:
    payload['subreferences'] = [
        {'name': item['name']} for item in subvariables
    ]

This results in the server enumerating the subvariables, appending -number to the parent variable name, e.g.:

for s in wave_ds['R3_Inmind'].items():
    s
(u'R3_Inmind-16', <Variable: name='Eriksson Bil'; id='0017'>)
(u'R3_Inmind-4', <Variable: name='Bilbolaget'; id='0005'>)
(u'R3_Inmind-27', <Variable: name='Imexa'; id='0028'>)
(u'R3_Inmind-14', <Variable: name='Borås'; id='0015'>)
(u'R3_Inmind-6', <Variable: name='Bilia'; id='0007'>)
[...]

We are already passing id and alias for each of the items into the method, so it can be corrected by simply also getting those properties from the input if provided:

if subvariables:
    payload['subreferences'] = [
        {'name': item['name'],
         'alias': item['alias'],
         'id': item['id']
         } for item in subvariables
    ]

The current behavior is problematic because of two things:

  1. It is not consistent with the way the "reversed" way of the conversion from Crunch to QP is working, for example in Crex and general SSC DP.

  2. It is leading to metadata conflicts in tracker scenarios, because the way DP has set up the dataset pre-Crunch is not reflected when a variable is created based on the definition they are seeing in their QP file.

The problem cannot be worked around (especially for MR variables), as we are treating multiple responses questions as "normal" questions, not as an array, so DP is not able to control the subvariable names manually by renaming. They are part of the metadata conversion.

tagging @jamesrkg.

mathiasbc commented 5 years ago

@alextanski https://github.com/Crunch-io/scrunch/pull/374

jamesrkg commented 5 years ago

Thanks @mathiasbc !

alextanski commented 5 years ago

thanks @mathiasbc, looks good. We can, however, leave out the code for getting the id from the subvariable definition, as (of course) Crunch generates that one directly from the url created for the subreference, hence no point to pass/get() it.

Thanks for the quick fix! 👍

xbito commented 5 years ago

The pull request has been merged after the latest fix from Mathias. Closing.