(a) Real-world data can occasionally have all data for a specific row/column missing.
(b) In processing time series, we know only about the past, not the future.
Example call:
impyute.imputation.ts.locf(p000008, axis=1, entire_set_nan_ok=True, no_look_forward=True)
Example code after modification - apologies, I've not done pull requests before :-)
import numpy as np
from impyute.ops import matrix
from impyute.ops import wrapper
from impyute.ops import error
For each set of missing indices, use the value of one row before(same
column). In the case that the missing value is the first row, look one
row ahead instead. If this next row is also NaN, look to the next row.
Repeat until you find a row in this column that's not NaN. All the rows
before will be filled with this value.
Parameters
----------
data: numpy.ndarray
Data to impute.
axis: boolean (optional)
0 if time series is in row format (Ex. data[0][:] is 1st data point).
1 if time series is in col format (Ex. data[:][0] is 1st data point).
no_look_forward boolean (optional). Default=False
False if NaN in first row, try to impute by looking ahead in next row.
True do not impute in first row, even if NaN is present there.
Result may contain NaN in first row.
entire_set_nan_ok boolean (optional) Default=False
False if entire column is NaN, raise exception.
True if entire column is NaN, ignore.
Result may contain NaN in entire column.
Returns
-------
numpy.ndarray
Imputed data.
"""
if axis == 0:
data = np.transpose(data)
elif axis == 1:
pass
else:
raise error.BadInputError("Error: Axis value is invalid, please use either 0 (row format) or 1 (column format)")
nan_xy = matrix.nan_indices(data)
# print(nan_xy)
for x_i, y_i in nan_xy:
# no_look_forward=True means do not impute first set with values from farther down
# meant to be used in situations where index is Time, so we would not not know what will happen in the future
# Simplest scenario, look one row back
# print(f'{x_i}', end=' ')
if x_i-1 > -1:
data[x_i][y_i] = data[x_i-1][y_i]
# Look n rows forward
elif not no_look_forward:
x_residuals = np.shape(data)[0]-x_i-1 # n datapoints left
val_found = False
for i in range(1, x_residuals):
if not np.isnan(data[x_i+i][y_i]):
val_found = True
break
if val_found:
# pylint: disable=undefined-loop-variable
for x_nan in range(i):
data[x_i+x_nan][y_i] = data[x_i+i][y_i]
else:
if entire_set_nan_ok:
pass
else:
raise Exception("Error: Entire Column is NaN")
return data
(a) Real-world data can occasionally have all data for a specific row/column missing. (b) In processing time series, we know only about the past, not the future.
Example call: impyute.imputation.ts.locf(p000008, axis=1, entire_set_nan_ok=True, no_look_forward=True)
Example code after modification - apologies, I've not done pull requests before :-)
import numpy as np from impyute.ops import matrix from impyute.ops import wrapper from impyute.ops import error
@wrapper.wrappers @wrapper.checks def locf(data, axis=0, no_look_forward=False, entire_set_nan_ok=False): """ Last Observation Carried Forward