This change takes a list of >2000 columns from 10s of minutes down to about 4 seconds:
# add the individual variable columns to the payload
_self = right_ds.resource.self
_variables = right_ds.resource.variables.by('alias')
var_urls = [
'{}variables/{}/'.format(_self, _variables[alias]['id'])
for alias in columns
]
for var_url in var_urls:
payload['body']['args'][0]['map'][var_url] = {'variable': var_url}
This code is terribly inefficient because every variable to be joined is instantiated only to get its url:
https://github.com/Crunch-io/scrunch/blob/7361a858c89fc12f7a15bd06d920c981798ad28d/scrunch/mutable_dataset.py#L102-L105
This change takes a list of >2000 columns from 10s of minutes down to about 4 seconds: