06. Final model iterations

EchteRobert commented 2 years ago

Two cluster training data (T: S3+S4)

Some final tweaks to training the model will be made in this issue. All of these tweaks will be made with Stain2, Stain3, and Stain4 in mind at the same time, in stead of 1 at a time. The first model is trained on 3 plates from Stain3 and Stain4 at the same time and evaluated on Stain2, Stain3, and Stain4.

Main takeaways

It's possible to generalize to clusters outside of the trained clusters by using training data from at least two clusters at the same time.
This actually also improves overall performance on validation mAP for plates within the training cluster. It is slightly worse than the best model trained on the Stain2 cluster specifically.

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116625highexp | **0.74** | 0.32 | **0.36** | 0.28 | 98.9 | 61.1 | | BR00116628highexp | **0.73** | 0.32 | **0.32** | 0.31 | 98.9 | 57.8 | | BR00116629highexp | **0.78** | 0.29 | **0.35** | 0.29 | 100 | 52.2 | | _Validation plates_ | | | | | | | | BR00116631highexp | **0.47** | 0.28 | 0.27 | **0.3** | 93.3 | 53.3 | | BR00116625 | **0.6** | 0.31 | **0.35** | 0.29 | 98.9 | 58.9 | | BR00116630highexp | **0.52** | 0.29 | **0.3** | 0.3 | 97.8 | 58.9 | | BR00116631 | **0.5** | 0.3 | 0.26 | **0.28** | 94.4 | 57.8 | | BR00116627highexp | **0.55** | 0.31 | **0.38** | 0.27 | 98.9 | 56.7 | | BR00116627 | **0.55** | 0.3 | **0.36** | 0.29 | 96.7 | 56.7 | | BR00116629 | **0.61** | 0.3 | **0.32** | 0.29 | 98.9 | 52.2 | | BR00116628 | **0.58** | 0.32 | 0.28 | **0.29** | 98.9 | 58.9 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115134 | **0.75** | 0.37 | **0.42** | 0.33 | 98.9 | 58.9 | | BR00115125 | **0.75** | 0.36 | **0.44** | 0.29 | 98.9 | 54.4 | | BR00115133highexp | **0.76** | 0.38 | **0.38** | 0.31 | 97.8 | 60 | | _Validation plates_ | | | | | | | | BR00115128highexp | **0.52** | 0.4 | **0.42** | 0.33 | 97.8 | 58.9 | | BR00115125highexp | **0.58** | 0.37 | **0.41** | 0.31 | 98.9 | 55.6 | | BR00115131 | **0.54** | 0.38 | **0.44** | 0.29 | 98.9 | 58.9 | | BR00115126 | **0.34** | 0.32 | **0.33** | 0.28 | 57.8 | 53.3 | | BR00115133 | **0.58** | 0.38 | **0.4** | 0.3 | 96.7 | 62.2 | | BR00115127 | **0.56** | 0.38 | **0.47** | 0.31 | 98.9 | 58.9 | | BR00115128 | **0.53** | 0.39 | **0.42** | 0.32 | 96.7 | 61.1 | | BR00115129 | **0.57** | 0.38 | **0.45** | 0.32 | 98.9 | 52.2 |

Table Stain2

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | BR00112202 | **0.43** | 0.34 | **0.38** | 0.3 | 88.9 | 54.4 | | BR00112197standard | **0.45** | 0.4 | **0.41** | 0.28 | 85.6 | 56.7 | | BR00112198 | **0.43** | 0.35 | **0.4** | 0.3 | 91.1 | 56.7 | | BR00112197repeat | **0.43** | 0.41 | **0.37** | 0.31 | 81.1 | 63.3 | | BR00112204 | **0.4** | 0.35 | **0.46** | 0.29 | 82.2 | 58.9 | | BR00112197binned | **0.43** | 0.41 | **0.39** | 0.3 | 86.7 | 58.9 | | BR00112201 | **0.47** | 0.4 | **0.41** | 0.32 | 91.1 | 66.7 |

EchteRobert commented 2 years ago

Two cluster training data (T: S2+S4)

This model is trained on 3 plates from Stain2 and Stain4 at the same time and evaluated on Stain2, Stain3, and Stain4.

Main takeaways

Training on Stain2 and Stain4 yields similar results to the previous model: it still generalizes to Stain3. However, one of the plates outside of the Stain3 cluster (BR00115126) did not perform as well, showing that there are still some plate effects that are being learned.

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116625highexp | **0.84** | 0.32 | **0.38** | 0.28 | 98.9 | 61.1 | | BR00116628highexp | **0.83** | 0.32 | **0.34** | 0.31 | 100 | 57.8 | | BR00116629highexp | **0.83** | 0.29 | **0.32** | 0.29 | 98.9 | 52.2 | | _Validation plates_ | | | | | | | | BR00116631highexp | **0.49** | 0.28 | 0.28 | **0.3** | 92.2 | 53.3 | | BR00116625 | **0.62** | 0.31 | **0.35** | 0.29 | 98.9 | 58.9 | | BR00116630highexp | **0.54** | 0.29 | **0.33** | 0.3 | 92.2 | 58.9 | | BR00116631 | **0.51** | 0.3 | 0.26 | **0.28** | 94.4 | 57.8 | | BR00116627highexp | **0.54** | 0.31 | **0.37** | 0.27 | 97.8 | 56.7 | | BR00116627 | **0.54** | 0.3 | **0.35** | 0.29 | 97.8 | 56.7 | | BR00116629 | **0.61** | 0.3 | **0.35** | 0.29 | 98.9 | 52.2 | | BR00116628 | **0.62** | 0.32 | **0.31** | 0.29 | 97.8 | 58.9 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | BR00115128highexp | **0.5** | 0.4 | **0.46** | 0.33 | 98.9 | 58.9 | | BR00115125highexp | **0.42** | 0.37 | **0.33** | 0.31 | 86.7 | 55.6 | | BR00115134 | **0.47** | 0.37 | **0.36** | 0.33 | 87.8 | 58.9 | | BR00115125 | **0.43** | 0.36 | **0.33** | 0.29 | 85.6 | 54.4 | | BR00115131 | **0.48** | 0.38 | **0.46** | 0.29 | 93.3 | 58.9 | | BR00115133 | **0.43** | 0.38 | **0.32** | 0.3 | 83.3 | 62.2 | | BR00115127 | **0.5** | 0.38 | **0.43** | 0.31 | 94.4 | 58.9 | | BR00115133highexp | **0.45** | 0.38 | **0.38** | 0.31 | 88.9 | 60 | | BR00115128 | **0.5** | 0.39 | **0.47** | 0.32 | 94.4 | 61.1 | | BR00115129 | **0.49** | 0.38 | **0.45** | 0.32 | 97.8 | 52.2 | | BR00115126 | 0.3 | **0.32** | **0.29** | 0.28 | 48.9 | **53.3** |

Table Stain2

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112201 | **0.8** | 0.4 | **0.58** | 0.32 | 100 | 66.7 | | BR00112198 | **0.77** | 0.35 | **0.55** | 0.3 | 100 | 56.7 | | BR00112204 | **0.8** | 0.35 | **0.53** | 0.29 | 100 | 58.9 | | _Validation plates_ | | | | | | | | BR00112202 | **0.59** | 0.34 | **0.49** | 0.3 | 100 | 54.4 | | BR00112197standard | **0.58** | 0.4 | **0.49** | 0.28 | 97.8 | 56.7 | | BR00112197binned | **0.49** | 0.41 | **0.43** | 0.3 | 87.8 | 58.9 | | BR00112197repeat | **0.57** | 0.41 | **0.5** | 0.31 | 95.6 | 63.3 |

EchteRobert commented 2 years ago

Two cluster training data (T: S2+S3)

This model is trained on 3 plates from Stain2 and Stain3 at the same time and evaluated on Stain2, Stain3, and Stain4.

Main takeaways

Training on Stain2 and Stain3 does not generalize to Stain4. Based on the previous two results it appears that Stain4 is the hardest to learn and is thus most suited to using as training data. Stain2 is the easiest and thus results in faster overfitting (perhaps there are stronger plate effects in Stain2).
These past three experiments show that training on plates within a cluster does increase performance on unseen plates within that cluster, while generally performance is lower when evaluating a model on plates outside of the training clusters.

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | BR00116631highexp | **0.29** | 0.28 | 0.17 | **0.3** | 68.9 | 53.3 | | BR00116625highexp | **0.37** | 0.32 | 0.26 | **0.28** | 76.7 | 61.1 | | BR00116628highexp | **0.34** | 0.32 | 0.22 | **0.31** | 80 | 57.8 | | BR00116625 | **0.36** | 0.31 | 0.27 | **0.29** | 76.7 | 58.9 | | BR00116630highexp | **0.36** | 0.29 | 0.23 | **0.3** | 78.9 | 58.9 | | BR00116631 | **0.32** | 0.3 | 0.17 | **0.28** | 65.6 | 57.8 | | BR00116629highexp | **0.36** | 0.29 | 0.26 | **0.29** | 81.1 | 52.2 | | BR00116627highexp | **0.36** | 0.31 | 0.25 | **0.27** | 78.9 | 56.7 | | BR00116627 | **0.35** | 0.3 | 0.26 | **0.29** | 75.6 | 56.7 | | BR00116629 | **0.36** | 0.3 | 0.21 | **0.29** | 74.4 | 52.2 | | BR00116628 | **0.33** | 0.32 | 0.19 | **0.29** | 72.2 | 58.9 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115134 | **0.79** | 0.37 | **0.4** | 0.33 | 98.9 | 58.9 | | BR00115125 | **0.73** | 0.36 | **0.42** | 0.29 | 98.9 | 54.4 | BR00115133highexp | **0.8** | 0.38 | **0.37** | 0.31 | 97.8 | 60 | | _Validation plates_ | | | | | | | | BR00115131 | **0.54** | 0.38 | **0.43** | 0.29 | 97.8 | 58.9 | | BR00115126 | **0.34** | 0.32 | **0.3** | 0.28 | 57.8 | 53.3 | | BR00115133 | **0.56** | 0.38 | **0.33** | 0.3 | 97.8 | 62.2 | | BR00115127 | **0.58** | 0.38 | **0.45** | 0.31 | 96.7 | 58.9 | | BR00115128 | **0.53** | 0.39 | **0.49** | 0.32 | 97.8 | 61.1 | | BR00115129 | **0.55** | 0.38 | **0.45** | 0.32 | 98.9 | 52.2 | | BR00115128highexp | **0.52** | 0.4 | **0.46** | 0.33 | 100 | 58.9 | | BR00115125highexp | **0.54** | 0.37 | **0.33** | 0.31 | 95.6 | 55.6 |

Table Stain2

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112198 | **0.74** | 0.35 | **0.54** | 0.3 | 100 | 56.7 | | BR00112204 | **0.75** | 0.35 | **0.51** | 0.29 | 100 | 58.9 | | BR00112201 | **0.75** | 0.4 | **0.51** | 0.32 | 100 | 66.7 | | _Validation plates_ | | | | | | | | BR00112202 | **0.55** | 0.34 | **0.42** | 0.3 | 97.8 | 54.4 | | BR00112197standard | **0.59** | 0.4 | **0.5** | 0.28 | 95.6 | 56.7 | | BR00112197repeat | **0.58** | 0.41 | **0.55** | 0.31 | 96.7 | 63.3 | | BR00112197binned | **0.55** | 0.41 | **0.51** | 0.3 | 93.3 | 58.9 |

EchteRobert commented 2 years ago

Three cluster training data (6 plates)

This model is trained on 2 plates from Stain2, Stain3, and Stain4 and evaluated on all the remaining plates within their clusters.

Main takeaways

This model does generalize decently to all datasets.
Interestingly, training on 3 plates from Stain2 and Stain4 resulted in better performance on Stain3 and Stain4 than training on two plates from all three datasets. Because the total number of plates is the same, a possible explanation could be that certain training plates introduce a bias in the model's solution.

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116625highexp | **0.82** | 0.32 | **0.36** | 0.28 | 97.8 | 61.1 | | BR00116628highexp | **0.85** | 0.32 | **0.3** | 0.31 | 98.9 | 57.8 | | _Validation plates_ | | | | | | | | BR00116625 | **0.59** | 0.31 | **0.31** | 0.29 | 96.7 | 58.9 | | BR00116630highexp | **0.48** | 0.29 | 0.28 | **0.3** | 91.1 | 58.9 | | BR00116629highexp | **0.5** | 0.29 | **0.31** | 0.29 | 95.6 | 52.2 | | BR00116627highexp | **0.54** | 0.31 | **0.36** | 0.27 | 97.8 | 56.7 | | BR00116627 | **0.51** | 0.3 | **0.34** | 0.29 | 95.6 | 56.7 | | BR00116629 | **0.49** | 0.3 | **0.31** | 0.29 | 94.4 | 52.2 | | BR00116628 | **0.55** | 0.32 | 0.24 | **0.29** | 96.7 | 58.9 | | BR00116631highexp | **0.41** | 0.28 | 0.24 | **0.3** | 86.7 | 53.3 | | BR00116631 | **0.45** | 0.3 | 0.24 | **0.28** | 93.3 | 57.8 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115134 | **0.88** | 0.37 | **0.42** | 0.33 | 98.9 | 58.9 | | BR00115125 | **0.83** | 0.36 | **0.43** | 0.29 | 98.9 | 54.4 | | _Validation plates_ | | | | | | | | BR00115128highexp | **0.53** | 0.4 | **0.45** | 0.33 | 94.4 | 58.9 | | BR00115125highexp | **0.57** | 0.37 | **0.36** | 0.31 | 98.9 | 55.6 | | BR00115131 | **0.53** | 0.38 | **0.44** | 0.29 | 97.8 | 58.9 | | BR00115126 | **0.35** | 0.32 | **0.34** | 0.28 | 64.4 | 53.3 | | BR00115133 | **0.45** | 0.38 | **0.32** | 0.3 | 82.2 | 62.2 | | BR00115127 | **0.57** | 0.38 | **0.47** | 0.31 | 96.7 | 58.9 | | BR00115133highexp | **0.46** | 0.38 | 0.3 | **0.31** | 87.8 | 60 | | BR00115128 | **0.54** | 0.39 | **0.42** | 0.32 | 95.6 | 61.1 | | BR00115129 | **0.56** | 0.38 | **0.44** | 0.32 | 94.4 | 52.2 |

Table Stain2

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112198 | **0.82** | 0.35 | **0.5** | 0.3 | 100 | 56.7 | | BR00112201 | **0.82** | 0.4 | **0.53** | 0.32 | 100 | 66.7 | | _Validation plates_ | | | | | | | | BR00112202 | **0.55** | 0.34 | **0.46** | 0.3 | 97.8 | 54.4 | | BR00112197standard | **0.57** | 0.4 | **0.45** | 0.28 | 95.6 | 56.7 | | BR00112197repeat | **0.57** | 0.41 | **0.49** | 0.31 | 94.4 | 63.3 | | BR00112204 | **0.55** | 0.35 | **0.48** | 0.29 | 98.9 | 58.9 | | BR00112197binned | **0.52** | 0.41 | **0.45** | 0.3 | 93.3 | 58.9 |

EchteRobert commented 2 years ago

Three cluster training data (9 plates)

This model is trained on 3 plates from Stain2, Stain3, and Stain4 and evaluated on all the remaining plates within their clusters.

Main takeaways

For a complete discussion of all trained models, see the comment below.

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116625highexp | **0.71** | 0.32 | **0.37** | 0.28 | 98.9 | 61.1 | | BR00116628highexp | **0.72** | 0.32 | **0.35** | 0.31 | 98.9 | 57.8 | | BR00116629highexp | **0.7** | 0.29 | **0.34** | 0.29 | 98.9 | 52.2 | | _Validation plates_ | | | | | | | | BR00116625 | **0.58** | 0.31 | **0.37** | 0.29 | 96.7 | 58.9 | | BR00116630highexp | **0.53** | 0.29 | **0.32** | 0.3 | 97.8 | 58.9 | | BR00116627highexp | **0.54** | 0.31 | **0.37** | 0.27 | 97.8 | 56.7 | | BR00116627 | **0.53** | 0.3 | **0.34** | 0.29 | 96.7 | 56.7 | | BR00116629 | **0.57** | 0.3 | **0.33** | 0.29 | 97.8 | 52.2 | | BR00116628 | **0.57** | 0.32 | **0.3** | 0.29 | 97.8 | 58.9 | | BR00116631highexp | **0.45** | 0.28 | 0.26 | **0.3** | 92.2 | 53.3 | | BR00116631 | **0.48** | 0.3 | 0.26 | **0.28** | 95.6 | 57.8 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115134 | **0.73** | 0.37 | **0.44** | 0.33 | 98.9 | 58.9 | | BR00115125 | **0.69** | 0.36 | **0.44** | 0.29 | 98.9 | 54.4 | | BR00115133highexp | **0.72** | 0.38 | **0.41** | 0.31 | 100 | 60 | | _Validation plates_ | | | | | | | | BR00115128highexp | **0.58** | 0.4 | **0.49** | 0.33 | 100 | 58.9 | | BR00115125highexp | **0.58** | 0.37 | **0.38** | 0.31 | 98.9 | 55.6 | | BR00115131 | **0.56** | 0.38 | **0.5** | 0.29 | 98.9 | 58.9 | | BR00115126 | **0.33** | 0.32 | **0.32** | 0.28 | 57.8 | 53.3 | | BR00115133 | **0.56** | 0.38 | **0.39** | 0.3 | 97.8 | 62.2 | | BR00115127 | **0.59** | 0.38 | **0.49** | 0.31 | 98.9 | 58.9 | | BR00115128 | **0.57** | 0.39 | **0.53** | 0.32 | 100 | 61.1 | | BR00115129 | **0.58** | 0.38 | **0.5** | 0.32 | 100 | 52.2 |

Table Stain2

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112204 | **0.69** | 0.35 | **0.54** | 0.29 | 100 | 58.9 | | BR00112201 | **0.72** | 0.4 | **0.52** | 0.32 | 100 | 66.7 | | BR00112198 | **0.68** | 0.35 | **0.52** | 0.3 | 100 | 56.7 | | _Validation plates_ | | | | | | | | BR00112197repeat | **0.58** | 0.41 | **0.52** | 0.31 | 97.8 | 63.3 | | BR00112197binned | **0.53** | 0.41 | **0.48** | 0.3 | 93.3 | 58.9 | | BR00112202 | **0.57** | 0.34 | **0.48** | 0.3 | 98.9 | 54.4 | | BR00112197standard | **0.6** | 0.4 | **0.47** | 0.28 | 96.7 | 56.7 |

EchteRobert commented 2 years ago

Model cross analysis

Here I compare all trained models described in the previous comments.

Main takeaways

This analysis shows that Stain4 is the hardest to generalize too, as we need examples of it in the triaining set in order to perform well on it. Unfortunately, it is not immediately clear why this is the case.
Based on the rank order analysis, the model trained using 3 plates from each cluster performed the best overall. Best performing models (based on mean/median) are highlighted in bold, worst in italic.

	Average rank across metrics
S3+S4	2.92
S2+S4	2.83
S2+S3	3.67
S2+S3+S4 (6 plates)	3.58
S2+S3+S4 (9plates)	1.92
Individual cluster	4.92

Stain2 validation mAP evaluation

| **Model name** | **Mean** | **Median** | **Min** | **Max** | **Mean rank** | **Median rank** | **Min rank** | **Max rank** | |----------------------|----------|------------|---------|---------|---------------|-----------------|--------------|--------------| | _S3+S4_ | 0.40 | 0.40 | 0.37 | 0.46 | 6.00 | 6.00 | 5.00 | 6.00 | | **S2+S4** | 0.51 | 0.50 | 0.43 | 0.58 | 1.00 | 3.00 | 3.00 | 1.00 | | S2+S3 | 0.51 | 0.51 | 0.42 | 0.55 | 2.00 | 2.00 | 4.00 | 2.00 | | S2+S3+S4 (6 plates) | 0.48 | 0.48 | 0.45 | 0.53 | 4.00 | 4.00 | 2.00 | 4.00 | | **S2+S3+S4 (9plates)** | 0.50 | 0.52 | 0.47 | 0.54 | 3.00 | 1.00 | 1.00 | 3.00 | | S2 | 0.44 | 0.45 | 0.37 | 0.49 | 5.00 | 5.00 | 5.00 | 5.00 |

Stain3 validation mAP evaluation

| **Model name** | **Mean** | **Median** | **Min** | **Max** | **Mean rank** | **Median rank** | **Min rank** | **Max rank** | |---------------------|----------|------------|---------|---------|---------------|-----------------|--------------|--------------| | S3+S4 | 0.42 | 0.42 | 0.33 | 0.47 | 2.00 | 2.00 | 1.00 | 3.00 | | S2+S4 | 0.39 | 0.38 | 0.29 | 0.47 | 5.00 | 5.00 | 5.00 | 3.00 | | S2+S3 | 0.40 | 0.42 | 0.30 | 0.49 | 3.00 | 2.00 | 3.00 | 2.00 | | S2+S3+S4 (6 plates) | 0.40 | 0.42 | 0.30 | 0.47 | 4.00 | 2.00 | 3.00 | 3.00 | | **S2+S3+S4 (9plates)** | 0.45 | 0.44 | 0.32 | 0.53 | 1.00 | 1.00 | 2.00 | 1.00 | | _S3_ | 0.37 | 0.38 | 0.29 | 0.44 | 6.00 | 5.00 | 5.00 | 6.00 |

Stain4 validation mAP evaluation

| Model name | Mean | Median | Min | Max | Mean rank | Median rank | Min rank | Max rank | |---------------------|------|--------|------|------|-----------|-------------|----------|----------| | **S3+S4** | 0.46 | 0.45 | 0.38 | 0.52 | 1.00 | 1.00 | 1.00 | 1.00 | | S2+S4 | 0.33 | 0.34 | 0.26 | 0.38 | 2.00 | 2.00 | 2.00 | 2.00 | | _S2+S3_ | 0.23 | 0.23 | 0.17 | 0.27 | 6.00 | 6.00 | 6.00 | 6.00 | | S2+S3+S4 (6 plates) | 0.30 | 0.31 | 0.24 | 0.36 | 5.00 | 4.00 | 4.00 | 4.00 | | S2+S3+S4 (9plates) | 0.33 | 0.34 | 0.26 | 0.37 | 3.00 | 2.00 | 2.00 | 3.00 | | S4 | 0.30 | 0.31 | 0.21 | 0.36 | 4.00 | 4.00 | 5.00 | 4.00 |

Rank order analysis

| Model name | Average mean rank | Average median rank | Average min rank | Average max rank | |---------------------|-------------------|---------------------|------------------|------------------| | S3+S4 | 3.00 | 3.00 | 2.33 | 3.33 | | S2+S4 | 2.67 | 3.33 | 3.33 | 2.00 | | S2+S3 | 3.67 | 3.33 | 4.33 | 3.33 | | S2+S3+S4 (6 plates) | 4.33 | 3.33 | 3.00 | 3.67 | | **S2+S3+S4 (9plates)** | 2.33 | 1.33 | 1.67 | 2.33 | | Individual cluster | 5.00 | 4.67 | 5.00 | 5.00 |

Extra plate analysis (2 from Stain2 and 1 from Stain4)

_S2+S3+S4 (9plates)_ | plate | Training mAP model | Validation mAP model | PR model | |:---------------|---------------------:|-----------------------:|-----------:| | BR00116634bin1 | 0.34 | 0.18 | 71.1 | | BR00113818 | 0.44 | 0.31 | 87.8 | | BR00113820 | 0.39 | 0.34 | 78.9 | _S2+S3+S4 (6 plates)_ | plate | Training mAP model | Validation mAP model | PR model | |:---------------|---------------------:|-----------------------:|-----------:| | BR00116634bin1 | 0.3 | 0.2 | 71.1 | | BR00113818 | 0.4 | 0.29 | 90 | | BR00113820 | 0.36 | 0.33 | 77.8 | _S2+S3_ | plate | Training mAP model | Validation mAP model | PR model | |:---------------|---------------------:|-----------------------:|-----------:| | BR00116634bin1 | 0.26 | 0.13 | 66.7 | | BR00113818 | 0.42 | 0.29 | 87.8 | | BR00113820 | 0.35 | 0.31 | 74.4 | _S2+S4_ | plate | Training mAP model | Validation mAP model | PR model | |:---------------|---------------------:|-----------------------:|-----------:| | BR00116634bin1 | 0.35 | 0.19 | 70 | | BR00113818 | 0.4 | 0.28 | 82.2 | | BR00113820 | 0.35 | 0.29 | 67.8 | _S3+S4_ | plate | Training mAP model | Validation mAP model | PR model | |:---------------|---------------------:|-----------------------:|-----------:| | BR00116634bin1 | 0.32 | 0.17 | 67.8 | | BR00113818 | 0.36 | 0.25 | 81.1 | | BR00113820 | 0.33 | 0.31 | 81.1 |

EchteRobert commented 2 years ago

Three cluster training data (12 plates)

As a final test, to see if increasing the number of training plates increases performance on validation compounds and plates, I train a model with 4 plates from Stain2, Stain3, and Stain4.

Main takeaways

Adding this model to the rank analysis from the previous comment, we see that indeed increasing the number of plates increases the average validation mAP. Although there is a bias as the number of plates that serve as training data increase and their validation mAP is also used for these calculations. It's even starting to generalize to the outlier plates in Stain4.

	Average mean rank	Average median rank	Average min rank	Average max rank	Average
S3+S4	3.67	3.67	2.67	4.00	3.50
S2+S4	3.67	4.33	4.33	2.67	3.75
S2+S3	4.67	4.33	5.33	4.00	4.58
S2+S3+S4 (6 plates)	5.33	4.33	3.67	4.67	4.50
S2+S3+S4 (9plates)	3.33	2.00	2.00	3.00	2.58
Individual cluster	6.00	5.67	6.00	6.00	5.92
S2+S3+S4 (12 plates)	1.33	1.33	2.00	2.00	1.67

Table Stain4

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116628 | **0.81** | 0.32 | **0.33** | 0.29 | 98.9 | 58.9 | | BR00116625highexp | **0.76** | 0.32 | **0.38** | 0.28 | 98.9 | 61.1 | | BR00116628highexp | **0.8** | 0.32 | **0.38** | 0.31 | 98.9 | 57.8 | | BR00116629highexp | **0.74** | 0.29 | **0.39** | 0.29 | 100 | 52.2 | | _Validation plates_ | | | | | | | | BR00116625 | **0.63** | 0.31 | **0.4** | 0.29 | 98.9 | 58.9 | | BR00116630highexp | **0.56** | 0.29 | **0.34** | 0.3 | 96.7 | 58.9 | | BR00116627highexp | **0.58** | 0.31 | **0.38** | 0.27 | 96.7 | 56.7 | | BR00116627 | **0.56** | 0.3 | **0.38** | 0.29 | 97.8 | 56.7 | | BR00116629 | **0.64** | 0.3 | **0.36** | 0.29 | 100 | 52.2 | | BR00116631highexp | **0.5** | 0.28 | 0.27 | **0.3** | 91.1 | 53.3 | | BR00116631 | **0.52** | 0.3 | **0.29** | 0.28 | 94.4 | 57.8 |

Table Stain3

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115134 | **0.75** | 0.37 | **0.46** | 0.33 | 98.9 | 58.9 | | BR00115125 | **0.68** | 0.36 | **0.48** | 0.29 | 98.9 | 54.4 | | BR00115133highexp | **0.79** | 0.38 | **0.41** | 0.31 | 97.8 | 60 | | BR00115133 | **0.79** | 0.38 | **0.43** | 0.3 | 97.8 | 62.2 | | _Validation plates_ | | | | | | | | BR00115131 | **0.58** | 0.38 | **0.48** | 0.29 | 97.8 | 58.9 | | BR00115126 | **0.36** | 0.32 | **0.32** | 0.28 | 58.9 | 53.3 | | BR00115127 | **0.6** | 0.38 | **0.51** | 0.31 | 98.9 | 58.9 | | BR00115128 | **0.57** | 0.39 | **0.54** | 0.32 | 100 | 61.1 | | BR00115129 | **0.59** | 0.38 | **0.5** | 0.32 | 98.9 | 52.2 | | BR00115128highexp | **0.57** | 0.4 | **0.49** | 0.33 | 98.9 | 58.9 | | BR00115125highexp | **0.55** | 0.37 | **0.41** | 0.31 | 98.9 | 55.6 |

Table Stain2

| | Average mean rank | Average median rank | Average min rank | Average max rank | | | |----------------------|-------------------|---------------------|------------------|--------------------|---|------| | S3+S4 | 4.67 | 4.67 | 3.00 | 4.33 | | 4.17 | | S2+S4 | 3.33 | 4.00 | 4.00 | 2.33 | | 3.42 | | S2+S3 | 4.67 | 4.33 | 5.33 | 4.00 | | 4.58 | | S2+S3+S4 (6 plates) | 5.33 | 4.33 | 3.67 | 4.67 | | 4.50 | | S2+S3+S4 (9plates) | 3.00 | 1.67 | 1.67 | 3.00 | | 2.33 | | Individual cluster | 6.00 | 5.67 | 6.00 | 6.00 | | 5.92 | | **S2+S3+S4 (12 plates)** | 1.00 | 1.00 | 1.67 | 1.67 | | **1.33** |

Number of training plates versus mean validation mAP

_We see some saturation in validation mAP for Stain2 and Stain3, which reinforce the higher validation mAP I have been getting for these datasets. Stain4 can still improve which is also in line with what I have been observing: Stain4 seems to be a more difficult dataset. What _difficult_ means exactly remains to be answered._ _The errorbars in the plot indicate the minimum and maximum validation mAP for a plate observed, so not very outlier proof._ ![NrTrainingPlatesVSvalidationmAP](https://user-images.githubusercontent.com/62173977/166962095-3ce525e1-de7a-415c-b8ce-14c52e93e1a0.png)

EchteRobert commented 2 years ago

Training plate influence

To test the influence of which training plates are used on model generalization, I switched up all the training plates and added 3 outlier plates (according to the PC1 loading correlations) as well. I then trained the model in the same way as previous models. Note that comparing the performance of the models is now even harder as the validation plates are completely different.

Main takeaway

It appears that, as long as enough training plates are used (i.e. at least 12 here), the model is able to learn a general method of aggregation for different types of analysis pipelines, no matter what training plates are used. Although I do think that using plates from different Stains (which differ quite a lot in terms of feature importances) is beneficial to generalization.

Results

Stain2 table

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112202 | **0.7** | 0.34 | **0.56** | 0.3 | 100 | 54.4 | | BR00112197binned | **0.73** | 0.41 | **0.56** | 0.3 | 98.9 | 58.9 | | _Validation plates_ | | | | | | | | BR00112197standard | **0.65** | 0.4 | **0.53** | 0.28 | 97.8 | 56.7 | | BR00112198 | **0.63** | 0.35 | **0.54** | 0.3 | 100 | 56.7 | | BR00112197repeat | **0.61** | 0.41 | **0.53** | 0.31 | 98.9 | 63.3 | | BR00112204 | **0.62** | 0.35 | **0.53** | 0.29 | 98.9 | 58.9 | | BR00112201 | **0.68** | 0.4 | **0.54** | 0.32 | 100 | 66.7 |

Stain3 table

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00115128 | **0.69** | 0.39 | **0.56** | 0.32 | 100 | 61.1 | | BR00115125highexp | **0.68** | 0.37 | **0.41** | 0.31 | 98.9 | 55.6 | | BR00115133highexp | **0.75** | 0.38 | **0.47** | 0.31 | 98.9 | 60 | | BR00115131 | **0.68** | 0.38 | **0.54** | 0.29 | 100 | 58.9 | | _Validation plates_ | | | | | | | | BR00115128highexp | **0.64** | 0.4 | **0.59** | 0.33 | 98.9 | 58.9 | | BR00115134 | **0.62** | 0.37 | **0.48** | 0.33 | 97.8 | 58.9 | | BR00115125 | **0.61** | 0.36 | **0.46** | 0.29 | 100 | 54.4 | | BR00115126 | **0.38** | 0.32 | **0.36** | 0.28 | 68.9 | 53.3 | | BR00115133 | **0.65** | 0.38 | **0.42** | 0.3 | 97.8 | 62.2 | | BR00115127 | **0.63** | 0.38 | **0.52** | 0.31 | 100 | 58.9 | | BR00115129 | **0.59** | 0.38 | **0.55** | 0.32 | 100 | 52.2 |

Stain4 table

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116631 | **0.65** | 0.3 | **0.32** | 0.28 | 96.7 | 57.8 | | BR00116627 | **0.69** | 0.3 | **0.39** | 0.29 | 97.8 | 56.7 | | BR00116630highexp | **0.69** | 0.29 | **0.4** | 0.3 | 96.7 | 58.9 | | _Validation plates_ | | | | | | | | BR00116631highexp | **0.6** | 0.28 | **0.3** | 0.3 | 95.6 | 53.3 | | BR00116625highexp | **0.61** | 0.32 | **0.42** | 0.28 | 98.9 | 61.1 | | BR00116628highexp | **0.64** | 0.32 | **0.37** | 0.31 | 97.8 | 57.8 | | BR00116625 | **0.58** | 0.31 | **0.39** | 0.29 | 97.8 | 58.9 | | BR00116629highexp | **0.64** | 0.29 | **0.41** | 0.29 | 97.8 | 52.2 | | BR00116627highexp | **0.62** | 0.31 | **0.48** | 0.27 | 98.9 | 56.7 | | BR00116629 | **0.62** | 0.3 | **0.37** | 0.29 | 97.8 | 52.2 | | BR00116628 | **0.62** | 0.32 | **0.31** | 0.29 | 96.7 | 58.9 |

Outlier plates table

| plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:---------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00116634bin1 | **0.59** | 0.24 | **0.31** | 0.18 | 96.7 | 53.3 | | BR00113818 | **0.69** | 0.28 | **0.45** | 0.29 | 96.7 | 52.2 | | BR00113820 | **0.67** | 0.3 | **0.45** | 0.3 | 96.7 | 55.6 |

EchteRobert commented 2 years ago

Aggregated profile UMAP analysis

Now that the model is getting consistent results on Stain2, Stain3, and Stain4, I want to do some qualitative analyses to investigate what the model is learning and what it outputs. First up is UMAPs of the model aggregated well profiles of the validation compounds for Stain2, Stain3, and Stain4.

Main takeaways

The model ignores batch effects for strong signal compounds and clusters them nicely. Mean aggregation also performs decent clustering for strong signal compounds while ignoring plate effects, however the clusters are much less separated than the model clusters.

UMAPs Stain2

_Mean aggregation_ ![Screen Shot 2022-05-11 at 3 52 31 PM](https://user-images.githubusercontent.com/62173977/167934120-c90cbb4f-eda0-4692-b579-f1383c7dbf3a.png) ![Screen Shot 2022-05-11 at 3 52 43 PM](https://user-images.githubusercontent.com/62173977/167934151-ca65819a-939e-426b-b8c5-1494be2a4436.png) _Model aggregation_ ![Screen Shot 2022-05-11 at 3 52 59 PM](https://user-images.githubusercontent.com/62173977/167934192-f9e3f42e-78c4-4b99-b124-9b9d23065255.png) ![Screen Shot 2022-05-11 at 3 53 13 PM](https://user-images.githubusercontent.com/62173977/167934226-b5159a7b-6459-495d-b683-9788cc387f20.png)

UMAPs Stain3

_Mean aggregation_ ![Screen Shot 2022-05-11 at 3 50 39 PM](https://user-images.githubusercontent.com/62173977/167933818-1f3f467e-4aa4-41f7-8541-c14050a2cf1f.png) ![Screen Shot 2022-05-11 at 3 50 51 PM](https://user-images.githubusercontent.com/62173977/167933853-28f08017-86e8-4cff-9afd-b58e99e2377f.png) _Model aggregation_ ![Screen Shot 2022-05-11 at 3 51 06 PM](https://user-images.githubusercontent.com/62173977/167933886-fdb7afed-5e9b-4022-bd16-8809d2dc5c56.png) ![Screen Shot 2022-05-11 at 3 51 17 PM](https://user-images.githubusercontent.com/62173977/167933909-89624194-b9e9-493c-a54c-7ee2d1d32ab6.png)

UMAPs Stain4

_Mean aggregation_ ![Screen Shot 2022-05-11 at 3 48 25 PM](https://user-images.githubusercontent.com/62173977/167933473-0db1f10e-68ac-4eae-83e6-7afde91d42c8.png) ![Screen Shot 2022-05-11 at 3 48 48 PM](https://user-images.githubusercontent.com/62173977/167933522-85b23c26-2c14-445b-af25-5a7e6a84a895.png) _Model aggregation_ ![Screen Shot 2022-05-11 at 3 49 00 PM](https://user-images.githubusercontent.com/62173977/167933561-afc808b5-3a9c-445c-8e89-9bdb4c955a58.png) ![Screen Shot 2022-05-11 at 3 49 11 PM](https://user-images.githubusercontent.com/62173977/167933589-187d0097-9347-40c8-9eec-3de9423ec176.png)

EchteRobert commented 2 years ago

Cell saliency analysis

In continuation of the previous experiment, I visualized the saliency of cells (i.e. the summed gradients over all features with respect to the SupConLoss over all wells. With this visualization I attempt to visualize how the model is selecting certain cells over others. I visualize 3 compounds here that are poorly profiled by the mean (~0.3 mAP), while they are strongly profiled by the model (~0.9 mAP): sirolimus (red), skepinone-l (green), and purmorphamine (viridis). From each compound I take two wells to visualize.

Main takeaways

The saliency map of sirolimus shows how giving more weight to some cells over others can improve the profile signal of the perturbation. It shows how one half of the cells is more important for discerning it from other compounds than the other half, which is something the mean cannot capture. This is shown in the last plot, where I threshold the saliency so that only the most salient cells are visualized. In that plot, we can easily distinguish sirolimus from the other two compounds.
The other two compounds are much harder to discern with this visualization. Not only do most cells overlap, the areas where the cells have higher saliency values also mostly overlap. We can still see that there is again some division in two parts, which may indicate what cells are more important than the others and provide some insight in what the model uses to create these profiles.
Note that the model not only aggregates cells, but also features, which is left out in this analysis.

Next up

Perhaps tracing back these cells to the images will give us more insight into what the model is learning.

Cell saliency visualization

_Brighter colours indicate more saliency according to the model; darker colours mean less saliency._ ![Screen Shot 2022-05-11 at 4 02 28 PM](https://user-images.githubusercontent.com/62173977/167936407-c81a71a9-592c-4d9f-96d4-516395b906d6.png) _sirolimus_ ![Screen Shot 2022-05-11 at 4 23 56 PM](https://user-images.githubusercontent.com/62173977/167941331-478d0d79-bd7b-4db5-8838-423931505ad8.png) _skepinone-l_ ![Screen Shot 2022-05-11 at 4 24 07 PM](https://user-images.githubusercontent.com/62173977/167941357-11d3efba-acfc-468e-9065-aae48b19f377.png) _purmorphamine_ ![Screen Shot 2022-05-11 at 4 24 52 PM](https://user-images.githubusercontent.com/62173977/167941468-c9fb2221-d834-4c3c-9f17-d9f1642a7b8e.png)

Saliency threshold

_The saliency values are normalized per well and thresholded to be above 0.8_ ![Screen Shot 2022-05-11 at 4 33 42 PM](https://user-images.githubusercontent.com/62173977/167942790-37b17ec7-8e41-4f3b-9c5a-ad7a9e467525.png)

EchteRobert commented 2 years ago

Cell saliencies overlay over complete FOV

Here, I show the raw images of a purmorphamine well (M08) in plate BR00112197binned (Stain2). Stain2 only contains 4 images so the FOV is larger than for the other Stain datasets. I use green and red boxes to denote high (>0.8) and low (<0.2) saliency cells. Perhaps in the future I will find a better way of visualizing these cells as the overlay impedes the visual analysis of those cells. I am only showing one FOV here, but it's split into 4 sections for inspection purposes.

The following was outlined by Mehrtash:

To gain more insight into what the model is doing, it would be very useful to "color" several complete FOVs based on the saliency scores and visually inspect them (to begin with). In the least interesting scenario, I suspect that the model might have learned to become a really good QC filter + mean aggregation over the passing cells -- which is still quite interesting, remarkable, and explains why it generalizes to new compounds. Another possibility is that the model might have further learned to pick divergent morphologies (in relevant directions) from the given bunch, come up with a consensus over those, and output the consensus features.

Main takeaways

It seems like the model is mostly looking at cells that are clearly separated, while giving less attention to cells in very crowded spaces. This can be seen in all four FOVs shown below. These images are taken from only one well and one compound though so I will need to check other wells and plates to see if this trend persists.

Images here!

![image](https://user-images.githubusercontent.com/62173977/170127521-6819810b-f82e-4bb6-9eee-7f6108275294.png) ![image](https://user-images.githubusercontent.com/62173977/170127765-b12fd163-3086-429a-b499-ba2a8806ddf5.png) ![image](https://user-images.githubusercontent.com/62173977/170127785-4ebfc81c-4a6b-4bd1-bdca-cdeac54a509a.png) ![image](https://user-images.githubusercontent.com/62173977/170127804-d22460d3-f362-426e-8988-226c65af6ad9.png)

EchteRobert commented 2 years ago

Admixing Experiment

The following experiment was outlined by Mehrtash:

Here's a useful experiment to gain more insight about what the network is doing: take a large number of cells from the same compound (and across several plates) and classify them according to saliency score into two groups -- high: top 20% in saliency, and low: bottom 20% in saliency; throw the middle away. Now, make synthetic inputs to your network with different admixtures of high and low saliency cells, e.g. 0 high + 500 low, 1 high + 499 low, 2 high + 498 low, ..., 499 high + 1 low. 500 high + 0 low, in a deterministic way (e.g. add one high, remove one low, rinse and repeat). Take a PCA of the network output over these 500 inputs and plot the first few PCs vs. admixing fraction, with 0 meaning 0 high + 500 low, and 1 meaning 500 high + 0 low. If you see a "gating" behavior w.r.t. admixing fraction, i.e. the PCs jumping up sharply after a threshold of high saliency cells and quickly stabilizing, then the network has definitely learned to ignore low saliency cells. The noise of the output further sheds light on what the network is doing to the high saliency cells: if the network is simply averaging high saliency cells, you'd expect ~ 1/\sqrt(N) noise in the network output, where N is the number of high saliency cells in the input. If the network is doing feature learning and gating, you'd see a faster scaling, e.g.. 1/N or faster.

I performed this experiment for multiple saliency cut-offs (5, 10, 20, and 40%) and tried different numbers of cells for the admixtures. I eventually settled on using 1000 cells (instead of the 500 mentioned above). Using more cells simply increases the 'resolution' of the figures by creating more datapoints. Note that for this experiment I am using 4 wells from a single plate (instead of multiple). I calculate the X% most salient cells per well and then merge them in one big pool to sample from during the experiment.

I use three types of saliency: gradient, distance (in loss space), and hold one cell out based saliencies, named V1, V2, and V3 respectively. V1 is considered to be more noisy and this measure does not necessarily point to cells that are the most or least representative of a certain profile. I think it rather points to cells whose features are most influential on creating an aggregated profile that is best positioned in the loss space. The exact definition remains hard to interpret and explain. V2 provides a distance measure of how far each single cell in a set is from the aggregated profile (using all cells in a set). Cells further away are considered less salient and cells close by are considered more salient. V3 computes the profile for a well and iteratively leaves one cell out of the set, until you have N profiles for a given well with N cells. Then the supervised contrastive loss is calculated for each of these profiles with respect to the aggregated profiles of all other wells in the plate. This means it has 3 positive pairs and 380 negative pairs. The profiles for which the loss is higher are given a higher saliency and vice versa.

As a sanity check I also performed this experiment using a cut-off of 100%, i.e. just randomly selecting cells. This last experiment should show no changes as a function of the admixing fraction, because there should be little variance captured in the first few PCs (as all profiles should be more similar).

Main takeaways

All three saliency methods show a similar pattern: the first PC decreases quickly at some admixing fraction value and then tapers off to a more stable value. This shows that the least salient cells and most salient cells (calculated with different methods) result in different profiles when aggregated by the model. Moreover, it shows how the model is indeed selecting the higher saliency cells after a certain threshold of admixture fraction until the point of saturation (at around 0.6 although this depends on multiple factors).
I think the saliency V2 gives the best signal/cleanest separation between cells that influence the profile. This is because the PCs are less noisy than the PCs shown for saliency method V1 and V3, where V3 is the worst.
As the cut-off percentage increases (from 0.05 to 1.0), we see that the PCs start flattening out more towards zero. This is expected behavior: the two groups (most salient and least salient cells) become more and more similar as this fraction increases and thus the aggregated profiles have lower variance. This is confirmed by looking at the 1.0 saliency cut-off plots, which show no pattern.
The first PC of V1 and V3 saliencies immediately starts decreasing as the admixture fraction increases, while that of V1 is flat at first. I currently think that this is because the lowest saliency cells in V1 and V3 are more similar to the highest saliency cells than is the case for V2. This would again indicate that the V2 saliency gives a cleaner separation between the lowest and highest saliency cells with respect to their influence on the aggregated profile. However, I am not certain this is right just yet.

Experiment results here (activation layer L1 norm - V0)!

![SaliencyV0_thres05](https://user-images.githubusercontent.com/62173977/172893820-d3b6448c-c750-48c8-aedb-c3c1f306a95e.png) ![SaliencyV0_thres10](https://user-images.githubusercontent.com/62173977/172893824-69336fff-a8bb-4be2-ac56-eba91969ed9e.png) ![SaliencyV0_thres20](https://user-images.githubusercontent.com/62173977/172893825-8b828d91-c16e-44aa-b873-76e228f19d20.png) ![SaliencyV0_thres40](https://user-images.githubusercontent.com/62173977/172893826-7e8efb5a-fcc6-4143-a510-f3a0f4455e9c.png)

Experiment results here (gradient saliency - V1)!

![image](https://user-images.githubusercontent.com/62173977/170350552-dd73def1-5f13-4de4-89aa-7ab2b85ea590.png) ![image](https://user-images.githubusercontent.com/62173977/170350569-1cce4360-1298-4e06-916a-5e46afca1740.png) ![image](https://user-images.githubusercontent.com/62173977/170350577-3f335ec4-4764-4636-ba3a-b1911bb681a9.png) ![image](https://user-images.githubusercontent.com/62173977/170350589-e2919326-0726-4074-9a0e-97e37d91c969.png) ![image](https://user-images.githubusercontent.com/62173977/170350619-8d1d8394-9306-4747-ac72-69b0f43e8fd2.png)

Experiment results here (V0 + V1)!

![SaliencyV0_V1_thres05](https://user-images.githubusercontent.com/62173977/172914064-1bf3d456-4146-4d2a-a300-de125bf8c757.png) ![SaliencyV0_V1_thres10](https://user-images.githubusercontent.com/62173977/172914066-a9fdc143-3e62-4892-a898-eee09d106e57.png) ![SaliencyV0_V1_thres20](https://user-images.githubusercontent.com/62173977/172914069-3ba3cabc-a856-4b64-a227-1312278b22ed.png) ![SaliencyV0_V1_thres40](https://user-images.githubusercontent.com/62173977/172914070-f3cb149b-c537-4789-99e2-cd2b90f8ca92.png)

Experiment results here (distance saliency - V2)!

![image](https://user-images.githubusercontent.com/62173977/170350638-9b8451f6-3c7b-4f32-bd23-7a2adc999489.png) ![image](https://user-images.githubusercontent.com/62173977/170350657-71712c2d-5541-43ab-b338-d14fc452670f.png) ![image](https://user-images.githubusercontent.com/62173977/170350677-c3ab4786-c930-4ef5-bec8-de0c6c5b87d5.png) ![image](https://user-images.githubusercontent.com/62173977/170350702-28c507f0-64bd-4393-ae72-f7cb31a5f41d.png) ![image](https://user-images.githubusercontent.com/62173977/170350721-3eadaaa8-6a25-4248-8309-75152fee90fe.png)

Experiment results here (leave one out saliency - V3)!

![image](https://user-images.githubusercontent.com/62173977/170350748-eea18ed1-ca42-4143-931b-982beab78739.png) ![image](https://user-images.githubusercontent.com/62173977/170350768-d1bb787b-7a97-46f7-8b52-2c52883880ea.png) ![image](https://user-images.githubusercontent.com/62173977/170350787-6bc07b11-dbc9-4901-bc7a-4af8b3bc9551.png) ![image](https://user-images.githubusercontent.com/62173977/170350802-a5025d90-8d78-4d71-822f-d2ce2b327485.png) ![image](https://user-images.githubusercontent.com/62173977/170350821-7474e6e3-e921-4722-8f0e-3a32c60286c2.png)

Updated figure (no random sampling for each fraction) Saliency V0 + V1

![Screen Shot 2022-06-09 at 2 43 06 PM](https://user-images.githubusercontent.com/62173977/172921121-231f2eef-da75-449b-8ed7-eca8557c011d.png)

EchteRobert commented 2 years ago

Inspecting correlation between saliency and CellProfiler features

_All of the results below are calculated with 'run-20220505221947-1m1zas58' aka the 'Stain234 12 plates outliers' model.

I have updated the saliency based cell image outlines, they now use square boxes instead of coloring the entire cell. I use either V0 (L1 norm of first activation layer) or V1 (L1 norm of the back propagated gradient by SupConLoss) saliency for the image boxes. I calculated the Pearson correlation between the various saliencies and the CellProfiler features of the input cells. The main idea is to figure out what the saliencies indicate. From visual inspection of the full fov's with V1 saliency overlay, we can see that higher saliency cells tend to be isolated while lower saliency cells tend to lie on top of each other or are in a more crowded space. If this is what the model is generally doing, the features corresponding to isolation should be highly correlated with the V1 saliency.

Main takeaways

V0 and V1 saliency have the highest correlations with CellProfiler features in general.
The top 20 features of saliency V0 are similar to those of saliency V0. The cosine similarity distance based saliency, V2, (where I calculate the distance from each projected cell to the aggregated profile) shows mostly similar negatively correlated features in the top 20 to V0 and V1, but not so much in positively correlated features. The 'iteratively leaving one cell out and calculating the differences in aggregated profiles' based saliency (V3) is dissimilar to all others. I think that this last result makes sense as a single cell should not be able to influence the profile too much.
For V0 and V1 positively correlated features are generally ‘AreaShape’ features. For V0, V1, and V2 negatively correlated features are generally ‘Intensity’ based features.
The top 20 features of V2 based saliency also contains 'Intensity' based features, but also 'RadialDistribution' features.
Summing the correlations of V0 and V1 shows that highly positively correlated features are related to cell size and distance to nearby cells. Highly negatively correlated features are related to DNA and RNA intensity.

Conclusion

The model likely gives higher weight to cells which are more isolated, defined by AreaShape, IntegratedIntensity (sum over intensity pixels), and nearest neighbor distances. It also gives more weight to cells with low DNA, RNA, and Mito intensities. In general, these correlations indicate a quality control filter. Isolated cells give better resolution of the cells, while high DNA, RNA and Mito intensities indicate cells that are in the process of cell division.

Full fov's with Saliency V0 overlay

![BR00112197binned_M08_f1c30sV0](https://user-images.githubusercontent.com/62173977/172668013-fea4d178-8f20-47fd-9b9f-716a84f84089.png) ![BR00112197binned_M08_f2c30sV0](https://user-images.githubusercontent.com/62173977/172668015-e3082a09-e324-4e2d-a40c-61a3902dfcde.png) ![BR00112197binned_M08_f3c30sV0](https://user-images.githubusercontent.com/62173977/172668017-a9df9e5e-8756-4ac4-87f1-4e8a5c3218d2.png) ![BR00112197binned_M08_f4c30sV0](https://user-images.githubusercontent.com/62173977/172668018-cf062ba2-36c1-4b0e-a99c-fa834ee31d60.png)

Full fov's with Saliency V1 overlay

![BR00112197binned_M08_f1c30sV1](https://user-images.githubusercontent.com/62173977/172668121-b0b1bc9a-0e65-4fc1-925c-a77bd19dd6e3.png) ![BR00112197binned_M08_f2c30sV1](https://user-images.githubusercontent.com/62173977/172668123-1156e48b-3978-47bf-8e7f-91e620e8ef69.png) ![BR00112197binned_M08_f3c30sV1](https://user-images.githubusercontent.com/62173977/172668124-31318e4a-5b30-49e8-8c9c-7be4c1e83cdc.png) ![BR00112197binned_M08_f4c30sV1](https://user-images.githubusercontent.com/62173977/172668126-55bac078-398d-42f3-9817-5df4f0f627ef.png)

Top20 positive Pearson correlations

| Saliency V0 | | | |------------|---------------------------------------------------|-------| | Features | Correlation | | | 535 | Cytoplasm.Cytoplasm_AreaShape_Area | 0.646 | | 874 | Cells.Cells_AreaShape_Area | 0.640 | | 515 | Cells.Cells_AreaShape_MeanRadius | 0.634 | | 484 | Cells.Cells_AreaShape_MedianRadius | 0.632 | | 651 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntens... | 0.628 | | 295 | Cells.Cells_Intensity_IntegratedIntensity_Brig... | 0.621 | | 187 | Cells.Cells_AreaShape_MaximumRadius | 0.619 | | 140 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntens... | 0.603 | | 898 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntens... | 0.592 | | 459 | Cells.Cells_AreaShape_MinorAxisLength | 0.591 | | 311 | Cytoplasm.Cytoplasm_AreaShape_MedianRadius | 0.589 | | 1061 | Cells.Cells_Intensity_IntegratedIntensity_Mito | 0.588 | | 130 | Cytoplasm.Cytoplasm_AreaShape_MinFeretDiameter | 0.587 | | 1048 | Cells.Cells_AreaShape_MinFeretDiameter | 0.587 | | 1094 | Cytoplasm.Cytoplasm_AreaShape_MinorAxisLength | 0.579 | | 1306 | Cells.Cells_AreaShape_Perimeter | 0.571 | | 633 | Cells.Cells_Intensity_IntegratedIntensity_AGP | 0.566 | | 565 | Cytoplasm.Cytoplasm_AreaShape_Perimeter | 0.562 | | 41 | Cytoplasm.Cytoplasm_AreaShape_MeanRadius | 0.561 | | 752 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntens... | 0.558 | | Saliency V1 | | | |------------|---------------------------------------------------|-------| | Features | Correlation | | | 338 | Cytoplasm.Cytoplasm_Correlation_K_DNA_Brightfield | 0.572 | | 770 | Nuclei.Nuclei_AreaShape_MeanRadius | 0.556 | | 515 | Cells.Cells_AreaShape_MeanRadius | 0.555 | | 187 | Cells.Cells_AreaShape_MaximumRadius | 0.553 | | 851 | Nuclei.Nuclei_AreaShape_MedianRadius | 0.542 | | 484 | Cells.Cells_AreaShape_MedianRadius | 0.541 | | 1227 | Nuclei.Nuclei_Correlation_Overlap_DNA_RNA | 0.532 | | 849 | Nuclei.Nuclei_AreaShape_MaximumRadius | 0.528 | | 459 | Cells.Cells_AreaShape_MinorAxisLength | 0.525 | | 1094 | Cytoplasm.Cytoplasm_AreaShape_MinorAxisLength | 0.521 | | 130 | Cytoplasm.Cytoplasm_AreaShape_MinFeretDiameter | 0.510 | | 1048 | Cells.Cells_AreaShape_MinFeretDiameter | 0.510 | | 208 | Cells.Cells_Neighbors_FirstClosestDistance_Adj... | 0.498 | | 874 | Cells.Cells_AreaShape_Area | 0.490 | | 985 | Cells.Cells_Neighbors_SecondClosestDistance_Ad... | 0.488 | | 1134 | Cells.Cells_Correlation_RWC_Brightfield_RNA | 0.488 | | 999 | Cells.Cells_Correlation_RWC_RNA_Brightfield | 0.477 | | 1117 | Nuclei.Nuclei_Correlation_K_ER_Brightfield | 0.470 | | 535 | Cytoplasm.Cytoplasm_AreaShape_Area | 0.468 | | 145 | Nuclei.Nuclei_AreaShape_MinorAxisLength | 0.463 | | Saliency V2 | | | |------------|---------------------------------------------------|-------| | Features | Correlation | | | 1117 | Nuclei.Nuclei_Correlation_K_ER_Brightfield | 0.458 | | 652 | Nuclei.Nuclei_Correlation_K_RNA_Brightfield | 0.433 | | 1227 | Nuclei.Nuclei_Correlation_Overlap_DNA_RNA | 0.411 | | 432 | Cells.Cells_Correlation_K_ER_Brightfield | 0.406 | | 485 | Cells.Cells_Correlation_K_RNA_Brightfield | 0.396 | | 720 | Nuclei.Nuclei_Correlation_Overlap_DNA_ER | 0.380 | | 1206 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.375 | | 1198 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.374 | | 76 | Nuclei.Nuclei_Correlation_K_AGP_Brightfield | 0.371 | | 1233 | Nuclei.Nuclei_Correlation_K_ER_AGP | 0.369 | | 1270 | Cytoplasm.Cytoplasm_Correlation_K_ER_Brightfield | 0.366 | | 408 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.366 | | 913 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.365 | | 949 | Cytoplasm.Cytoplasm_Correlation_K_RNA_Brightfield | 0.364 | | 726 | Nuclei.Nuclei_Correlation_K_Mito_Brightfield | 0.361 | | 384 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.341 | | 255 | Nuclei.Nuclei_Correlation_K_RNA_DNA | 0.339 | | 964 | Nuclei.Nuclei_Granularity_1_Mito | 0.338 | | 1141 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | 0.335 | | 134 | Cells.Cells_Granularity_1_ER | 0.325 | | Saliency V3 | | | |------------|---------------------------------------------------|-------| | Features | Correlation | | | 356 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_Bri... | 0.552 | | 704 | Cells.Cells_Intensity_StdIntensity_Brightfield | 0.485 | | 662 | Cells.Cells_Intensity_StdIntensityEdge_Brightf... | 0.448 | | 30 | Cytoplasm.Cytoplasm_Intensity_StdIntensityEdge... | 0.447 | | 1004 | Cytoplasm.Cytoplasm_RadialDistribution_RadialC... | 0.325 | | 1079 | Cytoplasm.Cytoplasm_Intensity_MaxIntensity_Bri... | 0.304 | | 68 | Cytoplasm.Cytoplasm_RadialDistribution_RadialC... | 0.299 | | 74 | Cytoplasm.Cytoplasm_Intensity_MADIntensity_Bri... | 0.279 | | 1169 | Cells.Cells_Intensity_MaxIntensity_Brightfield | 0.277 | | 982 | Cells.Cells_Correlation_Correlation_AGP_Bright... | 0.272 | | 617 | Nuclei.Nuclei_Intensity_StdIntensityEdge_Brigh... | 0.266 | | 111 | Cytoplasm.Cytoplasm_Correlation_Correlation_AG... | 0.263 | | 294 | Cells.Cells_Granularity_14_Brightfield | 0.263 | | 817 | Cytoplasm.Cytoplasm_Granularity_14_Brightfield | 0.262 | | 1093 | Nuclei.Nuclei_Granularity_14_Brightfield | 0.258 | | 842 | Cytoplasm.Cytoplasm_Granularity_15_Brightfield | 0.252 | | 540 | Cells.Cells_Granularity_15_Brightfield | 0.252 | | 639 | Nuclei.Nuclei_Granularity_15_Brightfield | 0.246 | | 415 | Nuclei.Nuclei_Correlation_K_Mito_ER | 0.241 | | 892 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge... | 0.233 | | Saliency V4 | | | |------------|---------------------------------------------------|-------| | Features | Correlation | | | 1227 | Nuclei.Nuclei_Correlation_Overlap_DNA_RNA | 0.545 | | 1117 | Nuclei.Nuclei_Correlation_K_ER_Brightfield | 0.525 | | 726 | Nuclei.Nuclei_Correlation_K_Mito_Brightfield | 0.511 | | 652 | Nuclei.Nuclei_Correlation_K_RNA_Brightfield | 0.498 | | 338 | Cytoplasm.Cytoplasm_Correlation_K_DNA_Brightfield | 0.489 | | 720 | Nuclei.Nuclei_Correlation_Overlap_DNA_ER | 0.483 | | 485 | Cells.Cells_Correlation_K_RNA_Brightfield | 0.471 | | 1149 | Nuclei.Nuclei_Correlation_Overlap_DNA_Mito | 0.471 | | 964 | Nuclei.Nuclei_Granularity_1_Mito | 0.442 | | 1233 | Nuclei.Nuclei_Correlation_K_ER_AGP | 0.440 | | 999 | Cells.Cells_Correlation_RWC_RNA_Brightfield | 0.437 | | 80 | Cytoplasm.Cytoplasm_Correlation_RWC_Brightfiel... | 0.437 | | 432 | Cells.Cells_Correlation_K_ER_Brightfield | 0.435 | | 949 | Cytoplasm.Cytoplasm_Correlation_K_RNA_Brightfield | 0.433 | | 1134 | Cells.Cells_Correlation_RWC_Brightfield_RNA | 0.432 | | 76 | Nuclei.Nuclei_Correlation_K_AGP_Brightfield | 0.422 | | 616 | Nuclei.Nuclei_AreaShape_Solidity | 0.415 | | 770 | Nuclei.Nuclei_AreaShape_MeanRadius | 0.411 | | 515 | Cells.Cells_AreaShape_MeanRadius | 0.406 | | 991 | Cells.Cells_Correlation_K_AGP_Brightfield | 0.405 |

Top20 negative Pearson correlations

| Saliency V0 | | | |-------------|---------------------------------------------------|--------| | Features | Correlation | | | 587 | Cytoplasm.Cytoplasm_Correlation_K_Mito_RNA | -0.620 | | 1040 | Cells.Cells_Correlation_K_Mito_RNA | -0.606 | | 930 | Cytoplasm.Cytoplasm_Correlation_K_Mito_DNA | -0.593 | | 828 | Cytoplasm.Cytoplasm_Intensity_MeanIntensity_DNA | -0.586 | | 72 | Cytoplasm.Cytoplasm_Intensity_UpperQuartileInt... | -0.562 | | 1309 | Cytoplasm.Cytoplasm_Intensity_MeanIntensityEdg... | -0.552 | | 439 | Cells.Cells_Intensity_MeanIntensityEdge_DNA | -0.539 | | 231 | Cells.Cells_Intensity_MinIntensity_DNA | -0.539 | | 918 | Cytoplasm.Cytoplasm_Intensity_MinIntensity_DNA | -0.539 | | 451 | Cytoplasm.Cytoplasm_Intensity_MinIntensityEdge... | -0.534 | | 499 | Cells.Cells_Intensity_MinIntensityEdge_DNA | -0.534 | | 186 | Cells.Cells_Correlation_K_Mito_AGP | -0.531 | | 813 | Cytoplasm.Cytoplasm_Correlation_K_Mito_AGP | -0.524 | | 1243 | Cells.Cells_Intensity_MeanIntensityEdge_RNA | -0.506 | | 438 | Cells.Cells_Intensity_MedianIntensity_DNA | -0.499 | | 52 | Cytoplasm.Cytoplasm_Intensity_LowerQuartileInt... | -0.495 | | 733 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_DNA | -0.493 | | 725 | Cytoplasm.Cytoplasm_Intensity_MedianIntensity_DNA | -0.489 | | 829 | Cells.Cells_Correlation_Overlap_Mito_RNA | -0.488 | | 716 | Cells.Cells_Intensity_MinIntensityEdge_RNA | -0.486 | | Saliency V1 | | | |-------------|---------------------------------------------------|--------| | Features | Correlation | | | 439 | Cells.Cells_Intensity_MeanIntensityEdge_DNA | -0.671 | | 389 | Cells.Cells_Intensity_StdIntensityEdge_DNA | -0.653 | | 1165 | Nuclei.Nuclei_RadialDistribution_RadialCV_DNA_... | -0.648 | | 1309 | Cytoplasm.Cytoplasm_Intensity_MeanIntensityEdg... | -0.636 | | 733 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_DNA | -0.635 | | 81 | Cells.Cells_Intensity_MaxIntensityEdge_RNA | -0.635 | | 289 | Cytoplasm.Cytoplasm_Correlation_K_Brightfield_DNA | -0.629 | | 986 | Cytoplasm.Cytoplasm_Intensity_MeanIntensityEdg... | -0.619 | | 663 | Cells.Cells_Intensity_MaxIntensityEdge_DNA | -0.617 | | 0 | Cells.Cells_Intensity_StdIntensityEdge_RNA | -0.610 | | 87 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge... | -0.608 | | 1070 | Nuclei.Nuclei_Intensity_MaxIntensityEdge_RNA | -0.608 | | 343 | Cytoplasm.Cytoplasm_Intensity_MaxIntensity_RNA | -0.606 | | 449 | Nuclei.Nuclei_Intensity_StdIntensityEdge_DNA | -0.606 | | 1243 | Cells.Cells_Intensity_MeanIntensityEdge_RNA | -0.603 | | 631 | Nuclei.Nuclei_Intensity_MaxIntensityEdge_DNA | -0.596 | | 72 | Cytoplasm.Cytoplasm_Intensity_UpperQuartileInt... | -0.594 | | 828 | Cytoplasm.Cytoplasm_Intensity_MeanIntensity_DNA | -0.588 | | 86 | Nuclei.Nuclei_Intensity_MeanIntensityEdge_RNA | -0.587 | | 103 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge... | -0.583 | | Saliency V2 | | | |-------------|---------------------------------------------------|--------| | Features | Correlation | | | 343 | Cytoplasm.Cytoplasm_Intensity_MaxIntensity_RNA | -0.460 | | 1070 | Nuclei.Nuclei_Intensity_MaxIntensityEdge_RNA | -0.458 | | 87 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge... | -0.458 | | 809 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_RNA | -0.455 | | 463 | Nuclei.Nuclei_Intensity_StdIntensityEdge_RNA | -0.453 | | 1009 | Nuclei.Nuclei_RadialDistribution_RadialCV_ER_4of4 | -0.450 | | 811 | Nuclei.Nuclei_RadialDistribution_RadialCV_RNA_... | -0.446 | | 402 | Nuclei.Nuclei_Intensity_MaxIntensityEdge_ER | -0.442 | | 1135 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge_ER | -0.441 | | 625 | Nuclei.Nuclei_Intensity_StdIntensityEdge_ER | -0.438 | | 151 | Nuclei.Nuclei_RadialDistribution_RadialCV_ER_3of4 | -0.435 | | 257 | Nuclei.Nuclei_Intensity_MaxIntensity_ER | -0.435 | | 86 | Nuclei.Nuclei_Intensity_MeanIntensityEdge_RNA | -0.430 | | 172 | Cytoplasm.Cytoplasm_Intensity_StdIntensityEdge... | -0.429 | | 1283 | Nuclei.Nuclei_Intensity_StdIntensity_ER | -0.424 | | 17 | Cytoplasm.Cytoplasm_Intensity_MADIntensity_RNA | -0.421 | | 191 | Nuclei.Nuclei_Intensity_UpperQuartileIntensity... | -0.420 | | 567 | Nuclei.Nuclei_Intensity_MeanIntensity_RNA | -0.418 | | 1121 | Cells.Cells_Intensity_StdIntensity_ER | -0.417 | | 803 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_ER | -0.417 | | Saliency V3 | | | |-------------|---------------------------------------------------|--------| | Features | Correlation | | | 526 | Cytoplasm.Cytoplasm_Intensity_MinIntensity_Bri... | -0.493 | | 1126 | Cells.Cells_Intensity_MinIntensity_Brightfield | -0.464 | | 1296 | Cytoplasm.Cytoplasm_Granularity_1_Brightfield | -0.459 | | 1140 | Cells.Cells_Granularity_1_Brightfield | -0.458 | | 1002 | Nuclei.Nuclei_Granularity_1_Brightfield | -0.434 | | 148 | Cytoplasm.Cytoplasm_Intensity_MinIntensityEdge... | -0.387 | | 412 | Cells.Cells_Intensity_MinIntensityEdge_Brightf... | -0.377 | | 913 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | -0.194 | | 1198 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | -0.192 | | 1219 | Cytoplasm.Cytoplasm_RadialDistribution_RadialC... | -0.181 | | 1226 | Cytoplasm.Cytoplasm_RadialDistribution_MeanFra... | -0.176 | | 1206 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | -0.175 | | 950 | Cytoplasm.Cytoplasm_RadialDistribution_MeanFra... | -0.169 | | 1071 | Nuclei.Nuclei_Correlation_K_ER_Mito | -0.168 | | 348 | Nuclei.Nuclei_Intensity_MassDisplacement_Mito | -0.163 | | 277 | Nuclei.Nuclei_RadialDistribution_RadialCV_Mito... | -0.162 | | 384 | Cytoplasm.Cytoplasm_RadialDistribution_FracAtD... | -0.160 | | 545 | Cytoplasm.Cytoplasm_Intensity_MassDisplacement... | -0.158 | | 44 | Cytoplasm.Cytoplasm_Correlation_K_RNA_AGP | -0.154 | | 1175 | Nuclei.Nuclei_Intensity_MinIntensityEdge_Brigh... | -0.152 | | Saliency V4 | | | |-------------|---------------------------------------------------|--------| | Features | Correlation | | | 343 | Cytoplasm.Cytoplasm_Intensity_MaxIntensity_RNA | -0.614 | | 1070 | Nuclei.Nuclei_Intensity_MaxIntensityEdge_RNA | -0.612 | | 87 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge... | -0.611 | | 463 | Nuclei.Nuclei_Intensity_StdIntensityEdge_RNA | -0.607 | | 811 | Nuclei.Nuclei_RadialDistribution_RadialCV_RNA_... | -0.583 | | 81 | Cells.Cells_Intensity_MaxIntensityEdge_RNA | -0.579 | | 809 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_RNA | -0.575 | | 1165 | Nuclei.Nuclei_RadialDistribution_RadialCV_DNA_... | -0.572 | | 0 | Cells.Cells_Intensity_StdIntensityEdge_RNA | -0.571 | | 389 | Cells.Cells_Intensity_StdIntensityEdge_DNA | -0.560 | | 172 | Cytoplasm.Cytoplasm_Intensity_StdIntensityEdge... | -0.557 | | 692 | Cells.Cells_Intensity_MaxIntensity_RNA | -0.552 | | 663 | Cells.Cells_Intensity_MaxIntensityEdge_DNA | -0.552 | | 793 | Nuclei.Nuclei_Intensity_MaxIntensity_RNA | -0.549 | | 86 | Nuclei.Nuclei_Intensity_MeanIntensityEdge_RNA | -0.549 | | 1135 | Cytoplasm.Cytoplasm_Intensity_MaxIntensityEdge_ER | -0.547 | | 191 | Nuclei.Nuclei_Intensity_UpperQuartileIntensity... | -0.547 | | 1283 | Nuclei.Nuclei_Intensity_StdIntensity_ER | -0.546 | | 469 | Nuclei.Nuclei_Intensity_StdIntensity_RNA | -0.537 | | 151 | Nuclei.Nuclei_RadialDistribution_RadialCV_ER_3of4 | -0.536 |

Summing the correlations of V0 and V1

| Saliency V0 + Saliency V1 | | | |---------------------------|-------------------------------------------------------------------|-------------| | | Features | Corr. sum | | 515 | Cells.Cells_AreaShape_MeanRadius | 1.189 | | 484 | Cells.Cells_AreaShape_MedianRadius | 1.173 | | 187 | Cells.Cells_AreaShape_MaximumRadius | 1.172 | | 874 | Cells.Cells_AreaShape_Area | 1.130 | | 459 | Cells.Cells_AreaShape_MinorAxisLength | 1.116 | | 535 | Cytoplasm.Cytoplasm_AreaShape_Area | 1.114 | | 1094 | Cytoplasm.Cytoplasm_AreaShape_MinorAxisLength | 1.100 | | 130 | Cytoplasm.Cytoplasm_AreaShape_MinFeretDiameter | 1.097 | | 1048 | Cells.Cells_AreaShape_MinFeretDiameter | 1.097 | | 295 | Cells.Cells_Intensity_IntegratedIntensity_Brightfield | 1.082 | | 338 | Cytoplasm.Cytoplasm_Correlation_K_DNA_Brightfield | 1.072 | | 651 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntensity_Brightfield | 1.070 | | 770 | Nuclei.Nuclei_AreaShape_MeanRadius | 1.047 | | 985 | Cells.Cells_Neighbors_SecondClosestDistance_Adjacent | 1.042 | | 208 | Cells.Cells_Neighbors_FirstClosestDistance_Adjacent | 1.038 | | 851 | Nuclei.Nuclei_AreaShape_MedianRadius | 1.028 | | 565 | Cytoplasm.Cytoplasm_AreaShape_Perimeter | 1.018 | | 1306 | Cells.Cells_AreaShape_Perimeter | 1.017 | | 311 | Cytoplasm.Cytoplasm_AreaShape_MedianRadius | 0.994 | | 825 | Cytoplasm.Cytoplasm_Intensity_IntegratedIntensityEdge_Brightfield | 0.984 | | Saliency V0 + Saliency V1 | | | |---------------------------|----------------------------------------------------------|-------------| | | Features | Corr. sum | | 439 | Cells.Cells_Intensity_MeanIntensityEdge_DNA | -1.210 | | 1309 | Cytoplasm.Cytoplasm_Intensity_MeanIntensityEdge_DNA | -1.188 | | 828 | Cytoplasm.Cytoplasm_Intensity_MeanIntensity_DNA | -1.174 | | 72 | Cytoplasm.Cytoplasm_Intensity_UpperQuartileIntensity_DNA | -1.156 | | 733 | Cytoplasm.Cytoplasm_Intensity_StdIntensity_DNA | -1.128 | | 1243 | Cells.Cells_Intensity_MeanIntensityEdge_RNA | -1.109 | | 986 | Cytoplasm.Cytoplasm_Intensity_MeanIntensityEdge_RNA | -1.076 | | 389 | Cells.Cells_Intensity_StdIntensityEdge_DNA | -1.064 | | 930 | Cytoplasm.Cytoplasm_Correlation_K_Mito_DNA | -1.060 | | 303 | Nuclei.Nuclei_Intensity_MeanIntensityEdge_DNA | -1.015 | | 438 | Cells.Cells_Intensity_MedianIntensity_DNA | -1.006 | | 289 | Cytoplasm.Cytoplasm_Correlation_K_Brightfield_DNA | -0.999 | | 725 | Cytoplasm.Cytoplasm_Intensity_MedianIntensity_DNA | -0.998 | | 663 | Cells.Cells_Intensity_MaxIntensityEdge_DNA | -0.993 | | 81 | Cells.Cells_Intensity_MaxIntensityEdge_RNA | -0.992 | | 327 | Cytoplasm.Cytoplasm_Intensity_MADIntensity_DNA | -0.970 | | 158 | Cytoplasm.Cytoplasm_Intensity_MeanIntensity_RNA | -0.965 | | 84 | Cells.Cells_Intensity_MeanIntensityEdge_AGP | -0.965 | | 1298 | Cells.Cells_Intensity_MeanIntensity_DNA | -0.963 | | 231 | Cells.Cells_Intensity_MinIntensity_DNA | -0.958 |

EchteRobert commented 2 years ago

MOA matching results (preliminary)

Below are the mean average precision values for matching sister compounds using the model, baseline, or random shuffling.

Stain2

![Screen Shot 2022-06-09 at 2 59 30 PM](https://user-images.githubusercontent.com/62173977/172923788-5e80fb99-bdfd-49ef-adfc-5eca3b8df76e.png)

Stain3

![Screen Shot 2022-06-09 at 3 00 03 PM](https://user-images.githubusercontent.com/62173977/172923880-a99d42e5-7974-49f7-8256-a12eaae4a465.png)

Stain4

![Screen Shot 2022-06-09 at 3 00 24 PM](https://user-images.githubusercontent.com/62173977/172923937-66e764e4-1c44-4522-9a74-dd18165f2fc4.png)

EchteRobert commented 2 years ago

One more experiment... (ellipsoid prediction)

Just as a last test, I evaluated the trained model (on 15 plates) on the generated ellipsoid data. If you need a refresher on the experimental setup: https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/3#issuecomment-1098357632 I am still using 2 dimensions to describe the ellipsoids, but I added 1322 empty dimensions to make the input fit into the model. This should be a trivial experiment as the model has already shown that it is able to beat the baseline, and thus is able to learn more than the mean. However, the theory is now that it is applying some form of quality control. If that means it is selecting cells which accurately describe the second moments of the cell set distribution than this task should always be completed perfectly. However, if it is also selecting cells which have a profile close to the mean it will not. It's also possible that the model is actually generating higher order moments from the input data and creating a profile based on that information.

Because I am using only 2 dimensions, I will roll the 2 dimensions over the 1324 available positions to see if this influences the models output. I plot the mAP as a function of the rolled dimensions. Although not exactly, this is an indicator of what features (according to their position) the model is using more than others. Low scores correspond to feature positions that little attention is paid to while the opposite is true for high scores. Moreover, this means that the AreaShape, IntegratedIntensity and Neighbors features are unavailable in some cases.

Main takeaways

The model has learned to distinguish distributions that have the same mean, meaning it is has some way of using higher order moments of the data. Whether this is by actually generating the higher order moments or by selecting cells which distinguish the set is uncertain.
Changing the dimensional position of the features can have drastic effects on model performance. The mAP values range from 0.5 to 1.0. This means that some features are much more important to the model than others (which is to be expected).
The overlay reveals no structure in the quality control type of features (deduced from the saliency analysis). That means that the model does not actually require these features for creating the profiles and thus it might just be a form of moment generating function after all.

@shntnu @johnarevalo I wonder what your thoughts are on this. Does this make sense or did I miss something?

mAP versus feature dimension position

_On the x-axis: feature dimension position (where all the way to the left is 0 and all the way to the right is the last (aka 1323th) position. On the y-axis the mAP for all classes (10) averaged over 4 samples._ ![Figure_3](https://user-images.githubusercontent.com/62173977/173402083-68b7c97c-3523-4062-8299-9b4c4e302839.png)

mAP versus feature dimensions position with feature overlay

_All AreaShape, IntegratedInstensity, and Neighbors features highlighted in yellow_ ![AllfeatsTrue](https://user-images.githubusercontent.com/62173977/173416034-9cf55789-d3de-4aaa-85e9-827ad336f33f.png) _All AreaShape and Neighbors features highlighted in yellow_ ![AreaShapeandNeighbors](https://user-images.githubusercontent.com/62173977/173416041-782c2585-fa07-4577-90f0-a7a71027d007.png) (I have also analyzed all the other features independently, but since they revealed no structure I am leaving them out of the analysis here)

carpenter-singh-lab / 2024_vanDijk_PLoS_CytoSummaryNet