Closed Jwata closed 3 years ago
This is very curious. Are the reported runtimes the median of 5 runs, or some other number of runs? Were there other compute jobs running on the same machine? What tool do you use to profile?
Are the reported runtimes the median of 5 runs , or some other number of runs?
No, it's not median. But the observed runtimes are in a similar range.
Were there other compute jobs running on the same machine?
No other running jobs.
What tool do you use to profile?
I used https://github.com/jvm-profiling-tools/async-profiler. It took around 5% of total runtime https://gist.github.com/Jwata/6dfce63d5f1522adee8cb3b23d1bccb3#file-gistfile1-txt-L3258
I've tried your change. Some remarks:
It is indeed faster, but it is not working as expected. There is a difference of about 50% in terms of accuracy (using the BasicClassificationPerformanceEvaluator). Current InstanceImpl version: 93.183% acc // about 4min CPU time Your InstanceImpl version: 42.183% acc // about 18sec CPU time
Curiously, I couldn't get similar results with a real dataset (covertype). I tried 30 learners and 10 learners; both versions yield the same results for accuracy and similar results for CPU time.
Regarding the differences in terms of accuracy when using the synthetic dataset. The differences in accuracy may indicate that the generator is not creating the instances properly. I haven't looked into the details, but I guess you need to make changes to all the constructors in InstanceImpl. Inspect the RandomTreeGenerator as well as InstanceImpl during execution to verify what is going on.
Yes, that's right. The problem is that InstanceImpl has several constructors and the code to initialize classIndex is only used in one of the constructors. RandomTreeGenerator uses another constructor, the one with InstanceImpl(numberOfAttributes).
The solution could be for all constructors to call a method that setups the right value for classIndex.
@hmgomes, @abifet
Thank you for trying the change on your side, and giving your insights.
As you pointed out, it looks my change isn't sufficient. but your experiment on a synthetic dataset may indicate that the overhead on classIndex() could be reduced with appropriate changes. Will look into the other constructors as you suggested.
Could you let me know the task you tried with the synthetic dataset so that I can try it?
What about only changing classIndex() with something like this?
/**
* The instance class index.
*/
protected int classIndex = -1;
/**
* Class index.
*
* @return the int
*/
@Override
public int classIndex() {
if (this.classIndex == -1) {
this.classIndex = instanceHeader.classIndex();
// return ? classIndex : 0;
if (classIndex == Integer.MAX_VALUE)
if (this.instanceHeader.instanceInformation.range != null)
classIndex = instanceHeader.instanceInformation.range.getStart();
else
classIndex = 0;
}
return classIndex;
}
Hi @Jwata I've used the following:
EvaluatePrequential -l (meta.AdaptiveRandomForest -l (ARFHoeffdingTree -k 10 -e 2000000 -g 50 -c 0.01) -s 10 -m 10 -x (ADWINChangeDetector -a 0.001) -p (ADWINChangeDetector -a 0.01)) -e BasicClassificationPerformanceEvaluator -i 100000
@abifet @hmgomes
Hi, Thank you for your suggestion.
I moved the caching logic into classIndex()
as @abifet suggested. but there is still the accuracy gap.
Looking at the evaluation history, the accuracy is always around 42%, which indicates that it doesn’t learn anything from the data.
Will update if I make it work.
In my machine, the change I proposed to you is working. Can you send the link to your commit? Thanks!
Sorry for my late reply. Here it is. https://github.com/Jwata/moa/pull/1/commits/8c91ccf23934e276535b9688de3cce583d9d0315
this.classIndex = -1; has to be outside InstanceImpl(InstanceImpl inst).
Try using
/**
It worked indeed. But it didn't improve the runtime speed... Log: https://gist.github.com/Jwata/a97420c33d9b1ae79963930d2c7fbb1f
With my original change, the speed was 5x faster, but it didn't learn from the data. With @abifet's change, it learnt from the data properly, but the speed wasn't fast.
This indicates that my original change skipped some code executions incorrectly, which affected the runtime speed. I guess we can conclude that the proposal doesn't work.
Thank you for your helps.
While profiling performance of
AdaptiveRandomForest
learner, I noticed thatInstanceImpl.classIndex()
takes long time. I then modifiedInstanceImpl
to calculate and cache class index at initialization. The change looks like this https://github.com/Jwata/moa/pull/1.The speed increased 4x with the modification. 53.37s -> 12.77s (CPU time)
This is the task I used.
But unexpectedly, this change significantly slowed down
HoeffdingTree
learner. 22.02s -> 10m31s (CPU time)This is the task I used.
Can anybody give insight on these performance changes?
In any cases, I think classIndex() shouldn't affect overall runtime. how can we mitigate/resolve this issue? (I expected the change I shared works for all situations, but it didn't...)