There can be None values in the data, I thought of the following approach to handle them. Looking forward to hear your comments on the same.
We ignore all the Nones in the start and the end of the data. And, the method for sampling a point from a bucket can be modified as follows (Bucketing method is same as before)
// Pseudocode
a_avg be the average area of all the areas calculated till now.
if (bucket is all Nones){
return None
}
if (left bucket is all Nones && right bucket is all Nones){
// Maybe a criteria to choose from the available not None points could be there?
return first not None element
}
if (left bucket is all Nones && right bucket is not all Nones){
// let r_avg[x], r_avg[y] be the average of the not Nones in right bucket
return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |r_avg[y] – p[y]| * | r_avg[x] – p[x]|
}
if (left bucket is not all Nones && right bucket is all Nones){
// let l_avg[x], l_avg[y] be the average of the not Nones in left bucket
return the point (p[x], p[y]) having maximum area of the triangle formed by 0.5 * |l_avg[y] – p[y]| * | l_avg[x] – p[x]|
}
Calculate the average only using non None values.
Compute the area of each point, let p_max be the point with the maximum area, and the max area be a_max
// Idea: None is the most significant sample if there are enough number of Nones in the bucket
// and area of the triangle computed for rest of the points is not significant enough
if (number of Nones in the bucket > bucket_size/2 && a_max < a_avg)
return None
}
return p_max
There can be None values in the data, I thought of the following approach to handle them. Looking forward to hear your comments on the same.
We ignore all the Nones in the start and the end of the data. And, the method for sampling a point from a bucket can be modified as follows (Bucketing method is same as before)