Closed sudiptoguha closed 10 months ago
For TimedRangeVector, package name 'com.amazon.randomcutforest.parkservices.returntypes' does not correspond to the file path 'com.amazon.randomcutforest.returntypes'.
Should we change the following code
public GenericAnomalyDescriptor(List<Weighted<P>> representative, double score, double threshold,
double anomalyGrade) {
this.representativeList = representativeList;
to
public GenericAnomalyDescriptor(List<Weighted<P>> representative, double score, double threshold,
double anomalyGrade) {
this.representativeList = representative;
?
In the constructor of ImputeVisitor, are we missing four fields including box, converged, pointIndex, and randomRank?
ImputeVisitor(ImputeVisitor original) {
int length = original.queryPoint.length;
this.queryPoint = Arrays.copyOf(original.queryPoint, length);
this.missing = Arrays.copyOf(original.missing, length);
this.dimensionsUsed = new int[original.dimensionsUsed.length];
this.randomSeed = new Random(original.randomSeed).nextLong();
this.centrality = original.centrality;
anomalyRank = DEFAULT_INIT_VALUE;
distance = DEFAULT_INIT_VALUE;
}
For TimedRangeVector, package name 'com.amazon.randomcutforest.parkservices.returntypes' does not correspond to the file path 'com.amazon.randomcutforest.returntypes'.
fixed. Thx.
Should we change the following code
public GenericAnomalyDescriptor(List<Weighted<P>> representative, double score, double threshold, double anomalyGrade) { this.representativeList = representativeList;
to
public GenericAnomalyDescriptor(List<Weighted<P>> representative, double score, double threshold, double anomalyGrade) { this.representativeList = representative;
?
Fixed. Thx.
In the constructor of ImputeVisitor, are we missing four fields including box, converged, pointIndex, and randomRank?
ImputeVisitor(ImputeVisitor original) { int length = original.queryPoint.length; this.queryPoint = Arrays.copyOf(original.queryPoint, length); this.missing = Arrays.copyOf(original.missing, length); this.dimensionsUsed = new int[original.dimensionsUsed.length]; this.randomSeed = new Random(original.randomSeed).nextLong(); this.centrality = original.centrality; anomalyRank = DEFAULT_INIT_VALUE; distance = DEFAULT_INIT_VALUE; }
We are missing them -- and that is intentional :) But perhaps the naming is inappropriate -- this is a private constructor to be invoked by copy() -- which is the mistake. Renames copy -> partialCopy(), which is the intention. These values which are copied are fixed for the query -- the other values are provided by the leaves in different branches. The partial copy is triggered when the partitioning coordinate is the missing value.
read until Java/core/src/main/java/com/amazon/randomcutforest/preprocessor/ImputePreprocessor.java of commit https://github.com/aws/random-cut-forest-by-aws/pull/401/commits/c8721d4a2b5f7c513153fe1d9279ac68ff546c9d
Description of changes: This PR initiates RCF 4.0. The primary change is the realization that while RCF has been built in layers over time -- some of the streaming normalization is standard yet extremely useful. This preprocessing functionality existed in ParkServices and yet it is increasingly clear that using RCFs without these normalizations are not really helpful. Thus the entire preprocessing is now shifted to core rationalizing the configs with the code. As an example benefit, we introduce a PredictiveRCF that update on vectors over attribute dimensions A and B, and given values of the dimension A provides a clustering over candidate values in dimensions B. This capability existed in the imputeMissingValues() -- but the addition of of the preprocessing (and the inverse map) alongside exposure of the clustering would likely be useful. As a consequence we can use this predictor to estimate the errors of forecasting in RCFCast, reducing the amount of state required for calibration of the output.
In addition newer tests have been added and the coverage of ParkServices is significantly higher.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.