Closed as6520 closed 3 years ago
Merging #120 (2a77b01) into master (5d89b63) will increase coverage by
0.070%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #120 +/- ##
=============================================
+ Coverage 93.135% 93.204% +0.070%
=============================================
Files 39 39
Lines 2316 2325 +9
=============================================
+ Hits 2157 2167 +10
+ Misses 159 158 -1
Impacted Files | Coverage Δ | |
---|---|---|
sail-on-client/sail_on_client/protocol/condda.py | 85.124% <0.000%> (-1.317%) |
:arrow_down: |
...-on-client/sail_on_client/protocol/ond_protocol.py | 72.222% <0.000%> (+2.682%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5d89b63...2a77b01. Read the comment docs.
use_consolidated_features
has been added in 2a77b01. Program metrics aren't defined for condda since there wasn't any consensus on characterization. One of the problem with evaluate_round_wise
would be that NMI requires significantly more samples than 32 or the scores are unstable. We should talk to Terry and Mohsen about it if we want to implement these measures round wise
Why are you saving the featues as a pickle for CONDDA and as a json for ond?
The OWL: Open world learning paper from Terry's lab defines a metric that is more stable than NMI for low amounts of labels.
Check you Equation 12 in https://arxiv.org/pdf/2011.12906.pdf
We should do a separate issue about adding it since not necessary yet.
For now we can just use NMI though (even though we won't show it). I think it's better to have it in here than to leave it out to keep the two protocols more similar (rather than adding it into condda in a later PR.
Does connda not have baseline? How can we calculate reaction performance? Can we calculate reaction performance for CONDDA?
Why are you saving the featues as a pickle for CONDDA and as a json for ond?
Features are saved as pickle in both in OND and CONDDA
The OWL: Open world learning paper from Terry's lab defines a metric that is more stable than NMI for low amounts of labels. Check you Equation 12 in https://arxiv.org/pdf/2011.12906.pdf We should do a separate issue about adding it since not necessary yet.
For now we can just use NMI though (even though we won't show it). I think it's better to have it in here than to leave it out to keep the two protocols more similar (rather than adding it into condda in a later PR.
I can add it in a different PR as I said earlier, evaluation for CONDDA isn't defined that is why evaluate function isn't called in it
Does connda not have baseline? How can we calculate reaction performance? Can we calculate reaction performance for CONDDA?
Yes CONDDA does not have a baseline or reaction performance computation. Since characterization isn't defined across domain, program metrics are not defined in CONDDA
I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318
For CONDDA, would characterization matter for reaction performance which is just on the known classes.
I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318
The scores obtained from the metrics are saved in json, the features are saved in https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L273
For CONDDA, would characterization matter for reaction performance which is just on the known classes.
I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy
Are the CONDDA scores saved here?
For CONDDA, would characterization matter for reaction performance which is just on the known classes.
I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy
But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.
For CONDDA, would characterization matter for reaction performance which is just on the known classes.
I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy
But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.
True we can use NMI pre-novelty and post-novelty. However it would just be accuracy since we expect 1-1 mapping for classes pre-novelty
For CONDDA, would characterization matter for reaction performance which is just on the known classes.
I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy
But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.
True we can use NMI pre-novelty and post-novelty. However it would just be accuracy since we expect 1-1 mapping for classes pre-novelty
That sounds fine to me
I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318
The scores obtained from the metrics are saved in json, the features are saved in https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L273
Are the CONNDA score written to a json?
Are the CONNDA score written to a json?
No, evaluate isn't called in CONDDA since evaluate isn't defined for CONDDA.
Are the CONNDA score written to a json?
No, evaluate isn't called in CONDDA since evaluate isn't defined for CONDDA.
So we don't have a self-eval for CONDDA
This PR adds the code to update harness parameters in condda and copies a test from OND to CONDDA for dry run.
Depends on #119