abdel commented 3 years ago

Refactoring of the legacy matching/similarity accuracy calculations to remove any underlying assumptions about inputs, and deal with individual feature/label pairs (rather than a set of features)
Renamed theLakeOneShotMetrics class to `OneShotMetrics; as it no longer just about 'lake' specific metrics
Refactored the OneShotMetrics class to allow the user to flexibly add individual (supported) metrics without underlying assumptions
Added method to specify which metrics to report on via the JSON definition config file, see aha_config_vgg.json for example
Added a PR accuracy metric to compare study vs recall PR with direct comparison (I don't think this works as we may have intended.. may still be missing something)

The OneShotMetrics supports the following 'comparison types':

match_mse: legacy metric; computes similarity based on MSE, reports accuracy and ambiguousness
match_overlap: legacy metric; computes similarity based on OVERLAP, reports accuracy and ambiguousness
mse: standard MSE metric on the given feature pairs e.g. mse(primary_feature, secondary_feature)
accuracy: standard mean accuracy calculation based on label pairs e.g. torch.eq(primary_label, secondary_label).float().mean()
mismatch: sum of abs difference between features divided by # samples

abdel commented 3 years ago

I forgot to add the following:

DG overlap, PR loss mismatch are now also being logged alongside other losses in TensorBoard during study/recall. However, they are not part of the 'metrics' class.

affogato commented 3 years ago

Fantastic work! This was a big one, and you did it elegantly.

affogato commented 3 years ago

@abdel I just realised that we've lost the summary images. i.e. the plots that we use to see the signals as they propagate throughout AHA, and that we used to report the results. I think we do need this. What was the thinking behind that?

abdel commented 3 years ago

Oh, accidental deletion.

I had commented out that section with the old code and then removed the entire block. I forgot the summary stuff was in there

abdel commented 3 years ago

@affogato I've reverted that change now in the main branch, so completion summaries should be back

affogato commented 3 years ago

@abdel thanks, and for doing that so quickly! Another thing: your code for the metrics is SO SO much simpler, more elegant, easier to work with ... than the old method. However, the old method has the advantage that it calculates a confusion matrix that can be visualised. Additionally, it allows arbitrary mapping of labels between train and test set. I'm very tempted to remove it though, and find a simpler way. For example, we could re-map train and test order before trying to compute metrics. And there must be simpler ways to calculate confusion matrices (or at least we could keep that method). What do you think?

abdel commented 3 years ago

I'm not sure what you mean by arbitrary mapping of labels between train and test?

I don't believe any of the functionality actually changed from the refactoring. The old compute_matching method returned matching_matrices,matching_accuracies and sum_ambiguous. This is actually still being calculated, it's just called similarity_matrix on line #34.. but it's the same function if you compare to the old compute_matching.

It may have been an option to visualise in the old code in TF, but it wasn't being visualised in the most recent PyTorch version.

affogato commented 3 years ago

I didn't think that the functionality has changed. But I thought that you did add additional new metric calculations (pasted below). I think the new code is much cleaner, so I'm tempted to remove the older (albeit refactored) compute_matching stuff. I know it isn't being visualised in PyTorch, but we may want to visualise it again in the future. The 'arbitrary mapping' means that the label for train sample i, does not have to be equal to the label for test sample i.

Accuracy based on labels

elif comparisontype == 'accuracy': self.metrics[prefix + '' + 'accuracy'] = torch.eq(primary_labels, secondary_labels).float().mean()

MSE

elif comparisontype == 'mse': self.metrics[prefix + '' + 'mse'] = utils.mse(primary_features, secondary_features)

Mismatch loss between two features

elif comparisontype == 'mismatch': self.metrics[prefix + '' + 'mismatch'] = torch.sum(torch.abs(secondary_features - primary_features)) / primary_features.shape[0]

On Wed, 10 Mar 2021 at 21:44, Abdelrahman Ahmed @.***> wrote:

I'm not sure what you mean by arbitrary mapping of labels between train and test?

I don't believe any of the functionality actually changed from the refactoring. The old compute_matching method returned matching_matrices, matching_accuracies and sum_ambiguous. This is actually still being calculated, it's just called similarity_matrix on line #34.. but it's the same function if you compare to the old compute_matching.

It may have been an option to visualise in the old code in TF, but it wasn't being visualised in the most recent PyTorch version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Cerenaut/pt-aha/pull/2#issuecomment-795230861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADPIHVBZ5BRIQPBKM32VF3TC5EP7ANCNFSM4Y5EG3WQ .

-- Gideon Kowadlo http://gideon.kowadlo.net

abdel commented 3 years ago

OK - I misunderstood. I think when you referred to the 'old method', I thought I may have removed functionality accidentally.. so you're just thinking about how to.. make it better basically?

As we've just refactored it to 'fit in'.. rather than re-think it.. happy to discuss more tomorrow morning if we want to re-design it?

Cerenaut / pt-aha

Refactored Metrics #2

Accuracy based on labels

MSE

Mismatch loss between two features