as6520 commented 3 years ago

This PR adds the code to update harness parameters in condda and copies a test from OND to CONDDA for dry run.

Depends on #119

codecov[bot] commented 3 years ago

Codecov Report

Merging #120 (2a77b01) into master (5d89b63) will increase coverage by 0.070%. The diff coverage is n/a.

@@              Coverage Diff              @@
##            master      #120       +/-   ##
=============================================
+ Coverage   93.135%   93.204%   +0.070%     
=============================================
  Files           39        39               
  Lines         2316      2325        +9     
=============================================
+ Hits          2157      2167       +10     
+ Misses         159       158        -1

Impacted Files	Coverage Δ
sail-on-client/sail_on_client/protocol/condda.py	`85.124% <0.000%> (-1.317%)`	:arrow_down:
...-on-client/sail_on_client/protocol/ond_protocol.py	`72.222% <0.000%> (+2.682%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5d89b63...2a77b01. Read the comment docs.

as6520 commented 3 years ago

use_consolidated_features has been added in 2a77b01. Program metrics aren't defined for condda since there wasn't any consensus on characterization. One of the problem with evaluate_round_wise would be that NMI requires significantly more samples than 32 or the scores are unstable. We should talk to Terry and Mohsen about it if we want to implement these measures round wise

cfunk1210 commented 3 years ago

Why are you saving the featues as a pickle for CONDDA and as a json for ond?

The OWL: Open world learning paper from Terry's lab defines a metric that is more stable than NMI for low amounts of labels.
Check you Equation 12 in https://arxiv.org/pdf/2011.12906.pdf We should do a separate issue about adding it since not necessary yet.

For now we can just use NMI though (even though we won't show it). I think it's better to have it in here than to leave it out to keep the two protocols more similar (rather than adding it into condda in a later PR.

cfunk1210 commented 3 years ago

Does connda not have baseline? How can we calculate reaction performance? Can we calculate reaction performance for CONDDA?

as6520 commented 3 years ago

Why are you saving the featues as a pickle for CONDDA and as a json for ond?

Features are saved as pickle in both in OND and CONDDA

as6520 commented 3 years ago

The OWL: Open world learning paper from Terry's lab defines a metric that is more stable than NMI for low amounts of labels. Check you Equation 12 in https://arxiv.org/pdf/2011.12906.pdf We should do a separate issue about adding it since not necessary yet.

For now we can just use NMI though (even though we won't show it). I think it's better to have it in here than to leave it out to keep the two protocols more similar (rather than adding it into condda in a later PR.

I can add it in a different PR as I said earlier, evaluation for CONDDA isn't defined that is why evaluate function isn't called in it

as6520 commented 3 years ago

Does connda not have baseline? How can we calculate reaction performance? Can we calculate reaction performance for CONDDA?

Yes CONDDA does not have a baseline or reaction performance computation. Since characterization isn't defined across domain, program metrics are not defined in CONDDA

cfunk1210 commented 3 years ago

I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318

cfunk1210 commented 3 years ago

For CONDDA, would characterization matter for reaction performance which is just on the known classes.

as6520 commented 3 years ago

I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318

The scores obtained from the metrics are saved in json, the features are saved in https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L273

as6520 commented 3 years ago

For CONDDA, would characterization matter for reaction performance which is just on the known classes.

I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy

cfunk1210 commented 3 years ago

Are the CONDDA scores saved here?

cfunk1210 commented 3 years ago

For CONDDA, would characterization matter for reaction performance which is just on the known classes.

I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy

But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.

as6520 commented 3 years ago

For CONDDA, would characterization matter for reaction performance which is just on the known classes.

I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy

But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.

True we can use NMI pre-novelty and post-novelty. However it would just be accuracy since we expect 1-1 mapping for classes pre-novelty

cfunk1210 commented 3 years ago

For CONDDA, would characterization matter for reaction performance which is just on the known classes.

I don't think it can be called characterization if only known classes are being considered. Since the clusters would have 1-1 mapping, I think NMI would have the same interpretation as accuracy

But we don't have a reaction to novelty accuracy for OND. It's the known reaction to novelty score.

True we can use NMI pre-novelty and post-novelty. However it would just be accuracy since we expect 1-1 mapping for classes pre-novelty

That sounds fine to me

cfunk1210 commented 3 years ago

I thought they were saved here as a JSON: https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L318

The scores obtained from the metrics are saved in json, the features are saved in https://github.com/darpa-sail-on/sail-on-client/blob/master/sail_on_client/protocol/ond_protocol.py#L273

Are the CONNDA score written to a json?

as6520 commented 3 years ago

Are the CONNDA score written to a json?

No, evaluate isn't called in CONDDA since evaluate isn't defined for CONDDA.

cfunk1210 commented 3 years ago

Are the CONNDA score written to a json?

No, evaluate isn't called in CONDDA since evaluate isn't defined for CONDDA.

So we don't have a self-eval for CONDDA

darpa-sail-on / sail-on-client

Update condda #120

Codecov Report