Open hokiegeek2 opened 4 years ago
I am looking into parallelizing a section of code in detect_anoms where the majority of execution time is spent:
if not one_tail: ares = abs(data - data.median()) elif upper_tail: ares = data - data.median() else: ares = data.median() - data ares = ares / data.mad() tmp_anom_index = ares[ares.values == ares.max()].index cand = pd.Series(data.loc[tmp_anom_index], index=tmp_anom_index) data.drop(tmp_anom_index, inplace=True)
Is there a way to refactor the code so that ordering enforced by the for loop for the data.drop invocations is no longer needed?
Similar question here:
for i in range(1, data.size + 1, num_obs_in_period): start_date = data.index[i] # if there is at least 14 days left, subset it, otherwise subset last_date - 14 days end_date = start_date + datetime.timedelta(days=num_days_in_period) if end_date < data.index[-1]: all_data.append( data.loc[lambda x: (x.index >= start_date) & (x.index <= end_date)]) else: all_data.append( data.loc[lambda x: x.index >= data.index[-1] - datetime.timedelta(days=num_days_in_period)]) return all_data
I am a software engineer, not a data scientist, so this may be a very naive question. :)
--John
I am looking into parallelizing a section of code in detect_anoms where the majority of execution time is spent:
Is there a way to refactor the code so that ordering enforced by the for loop for the data.drop invocations is no longer needed?
Similar question here:
I am a software engineer, not a data scientist, so this may be a very naive question. :)
--John