carpenter-singh-lab / 2024_vanDijk_PLoS_CytoSummaryNet

1 stars 1 forks source link

03. Model for Stain2 #5

Open EchteRobert opened 2 years ago

EchteRobert commented 2 years ago

It is now clear that this feature aggregation model will only serve a certain feature set (meaning a certain dataset line), and is not developed to be able to aggregate any feature set (it is only invariant to the number of cells per well). I will start with creating a model that is able to beat the 'mean aggregation' baselines of the Stain2 batches, and then move forward to Stain3, Stain4, and finally use Stain5 as a final testset.

Because of that it would be ideal if all features across Stain datasets were the same. This is (somewhat) the case across Stain2, Stain3, and Stain4. However, Stain5 has a slightly different cellprofiler pipeline resulting in a different and larger feature set. During preprocessing I found that the pipeline from raw single-cell features to data that can directly be fed to the model, is quite a slow process. This is especially the case when all features are used (in this case 4295 for Stain 2-4 and 5794 for Stain 5). The model inference and training also becomes increasingly slower as the number of features increases. From the initial experiments on CPJUMP1 we saw that not all features are needed to create a better profile than the baseline (https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/1). This is why I have chosen to select only all common features across Stain 2-5. This has the advantage of speed, both in preprocessing and inference, and compatibility, as no separate model will have to be trained to use Stain5 as the test set.

Assuming that the features across Stain2, Stain3, Stain4, and Stain5 are consistent within each experiment, there are 1324 features which are measured in all of them. The features are well distributed in terms of category: Cells: 441 features, Cytoplasm: 433 features, and Nuclei: 450 features. 1124 of them are decently uncorrelated (<abs(0.5) Pearsson correlation) [one plate tested]. From hereon these are the features that will be used to train the model.

EchteRobert commented 2 years ago

The Stain 2 experiment (https://github.com/jump-cellpainting/pilot-analysis/issues/15) contains 14 batches, of which only 1 will not be used to train the model. This is BR00112200 (Confocal) which contains less features than the other batches due to it missing the RNA channel. All other batches will be used to train or validate the model. See overview below:

Beautiful colours here! _Note that the Percent Strong shown here is calculated with an additional sphering operation_ Screen Shot 2022-02-28 at 2 20 31 PM _The Percent Strong/Replicating with feature selected features - no sphering_ | Description | Percent_Replicating | |:-----------------------|----------------------:| | BR00113818.csv | 51.1 | | BR00113819.csv | 51.1 | | BR00113821.csv | 51.1 | | BR00113820.csv | 56.7 | | BR00112198.csv | 55.6 | | BR00112204.csv | 63.3 | | BR00112199.csv | 58.9 | | BR00112200.csv | 63.3 | | BR00112201.csv | 70 | | BR00112197repeat.csv | 63.3 | | BR00112203.csv | 52.2 | | BR00112202.csv | 56.7 | | BR00112197binned.csv | 58.9 | | BR00112197standard.csv | 66.7 | _The Percent Strong/Replicating with the 1324 features as used by the model - **I will use this as the reference BM**_ | Description | Percent_Replicating | |:-----------------------|----------------------:| | BR00113818.csv | 52.2 | | BR00113819.csv | 48.9 | | BR00113821.csv | 47.8 | | BR00113820.csv | 55.6 | | BR00112198.csv | 56.7 | | BR00112204.csv | 58.9 | | BR00112199.csv | 57.8 | | BR00112201.csv | 66.7 | | BR00112197repeat.csv | 63.3 | | BR00112203.csv | 56.7 | | BR00112202.csv | 54.4 | | BR00112197binned.csv | 58.9 | | BR00112197standard.csv | 56.7 |
EchteRobert commented 2 years ago

Experiment 1

The first model is trained on BR00112197 binned, BR00112199 multiplane, and BR00112203 MitoCompare. These are the most distinct batches that could have been chosen, all other batches' features have values that contain more similar distributions. The training and validation loss curves indicate slow but steady learning and the model has not converged after 50 epochs. The PR will be calculated for each batch as a whole without the negative controls. The training data consists of 80% of each batch, meaning that the model has not seen the remaining 20% during training. The model will also be tested on a completely unseen batch.

Main Takeaways

Conclusion

The model shows promise in learning general aggregation methods which can be applicable to unseen data, as long as the features remain constant. However, something unexpected is going on for the BR00112199 MultiPlane and BR00112197 binned batches. I will investigate whether these results are due to chance or something else is going on.

Results! Wooh! Screen Shot 2022-02-28 at 2 24 06 PM _BR00112203 MitoCompare - training data_ ![Stain2_BR00112203_MitoCompare_PR](https://user-images.githubusercontent.com/62173977/156059037-cf34c8bb-472b-48a1-a041-8fccdeb3668b.png) _BR00112203 MitoCompare Robust MAD normalized features_ ![Stain2_BR00112203_MitoCompare_normalized_PR](https://user-images.githubusercontent.com/62173977/156059021-071ca8aa-58db-42eb-a89d-474c4c8baed1.png) _BR00112199 MultiPlane - training data_ ![Stain2_BR00112199_MultiPlane_PR](https://user-images.githubusercontent.com/62173977/156059012-7e5d3a03-3858-4bae-9c7a-958b79c3a739.png) _BR00112197 binned - training data_ ![Stain2_BR00112197binned_PR](https://user-images.githubusercontent.com/62173977/156059006-52028fb2-c620-45dd-8c4c-099a060622ae.png) _BR00113818 Redone - **not in training set**_ ![Stain2_BR00113818_Redone_PR](https://user-images.githubusercontent.com/62173977/156058989-e1ba9812-87ff-4729-9945-49d249a4b3ad.png)
EchteRobert commented 2 years ago

While trying to find the cause for the possible issue described in https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1054601450, I found that the model creates a feature space that puts features from the same batch closer together than the mean aggregation method does. Whether this is a good thing or not is not obvious to me. Note that BR00113818 is not in the training set of the MLP.

Look at these patterns! ![UMAP_MLP](https://user-images.githubusercontent.com/62173977/156071350-cdbf6a72-907a-4ad5-b4f0-294f3ab4a337.png) ![UMAP_BM](https://user-images.githubusercontent.com/62173977/156071369-70ad2541-a4ed-45c3-a75b-0848ad16afab.png)
EchteRobert commented 2 years ago

Experiment 1 (continued)

As the model improved the PS upon the baseline in all of the previous plates, I will now test the model on 5 for more plates from the Stain2 dataset: _BR00113818Redone, _BR00113819Redone, _BR00113820Redone, _BR00113821Redone, and _BR00112197repeat. The PR/PS is reported below. I also plotted the number of cells per well per plate in histograms.

Main takeaways

The model performs similar to or better than the average aggregation method for 3 out of 5 plates. For the remaining two it significantly underperformed however. I expected this to be due to the average number of cells that would be present in the plates. Looking at the histograms of these two plates (_BR00113820Redone and _BR00113821Redone), we can see that this might indeed be the cause as these two plates have a different distribution of cells per well and less cells overall.

Later addition: As discussed with @shntnu I calculated the PC1 loadings per plate and the correlation between these loadings. See below. It shows how especially BR00112203 (training), BR00113819, BR00113820, and BR00113821 do not correlate well with with the other plates in terms of PC1 loadings, i.e. other features are more important to describe the profiles of these plates. Note also that BR0011203, and BR00112199 are used as 2 of the 3 training plates, while these correlate especially less with the two poorly performing plates. Especially because the PR of the BR00112203 (training) is the highest, while its PC1 loadings correlation is relatively low with all other plates it is expected that the model performs worse on all other plates.

Conclusion: the plates used during training probably influence the model to pay more attention to a specific set of features, which are not as relevant for the poorly performing plates.

Are you ready for this? _BR00112197_repeat_ ![Stain2_BR00112197repeat_PR](https://user-images.githubusercontent.com/62173977/156247147-6dbcff65-1dd9-4a27-aa86-4f08d192a93c.png) _BR00113818_Redone_ ![Stain2_BR00113818_Redone_PR](https://user-images.githubusercontent.com/62173977/156247159-bc14fbb1-73c2-46bd-bd19-f2df2e16ec36.png) _BR00113819_Redone_ ![Stain2_BR00113819_Redone_PR](https://user-images.githubusercontent.com/62173977/156247171-9a06c2b5-d9a4-4e91-8347-23b4beb28253.png) _BR00113820_Redone_ ![Stain2_BR00113820_Redone_PR](https://user-images.githubusercontent.com/62173977/156247179-46e736ce-b8f9-42c7-b86a-965706042598.png) _BR00113821_Redone_ ![Stain2_BR00113821_Redone_PR](https://user-images.githubusercontent.com/62173977/156247194-890db46f-fcad-4321-b9fb-c1e406fade07.png)
Don't forget to look at these! ![BR00112197binned_hist](https://user-images.githubusercontent.com/62173977/156247475-bee185af-e3ce-4083-bf85-56990c7bc626.png) ![BR00113820_hist](https://user-images.githubusercontent.com/62173977/156247445-d7f23d68-8bfa-4143-9be1-b46ff047a564.png) ![BR00113821_hist](https://user-images.githubusercontent.com/62173977/156247454-fc44f452-b8b1-41ad-9bdf-e5c634aeefff.png)
This is additional stuff. Perhaps not as interesting as the first bit? You decide. ![BR00112197repeat_hist](https://user-images.githubusercontent.com/62173977/156247675-37cb712d-50bb-48e4-9e25-493522005112.png) ![BR00112199_hist](https://user-images.githubusercontent.com/62173977/156247693-936146ce-8816-4cde-ae5d-7fe2fa939fb1.png) ![BR00112203_hist](https://user-images.githubusercontent.com/62173977/156247698-b5b3d920-c64b-4b4f-8c3c-8611d41fe6d0.png) ![BR00113818_hist](https://user-images.githubusercontent.com/62173977/156247708-70ff09fb-5291-4f59-a051-0a69cc5bb711.png) ![BR00113819_hist](https://user-images.githubusercontent.com/62173977/156247716-71ab1618-c7c7-4c67-8f6a-d0e64f74fd34.png)
PC1 loadings per plate ![PC1_loadings_Stain2](https://user-images.githubusercontent.com/62173977/156655488-0e66f027-8618-4e31-9d53-990c93d01a6e.png)
Number of cells per well per plate summary ![Stain2_cells](https://user-images.githubusercontent.com/62173977/172942967-53123370-e221-4bb2-b37a-344a796ec044.png)
niranjchandrasekaran commented 2 years ago

The model performs similar to or better than the average aggregation method for 3 out of 5 plates. For the remaining two it significantly underperformed however.

@EchteRobert Quick question - did you recompute Percent Replicating for the baseline using the 1324 features or are these values from the original baseline in https://github.com/jump-cellpainting/pilot-analysis/issues/15#issuecomment-670640802? If it is the latter, I would recommend doing the former so that we are comparing apples to apples.

Also, the cell count histograms surprised me. Given that the only difference between the plates is the dye concentration, I did not expect to see such a huge difference in the number of cells between plates.

EchteRobert commented 2 years ago

I did not @niranjchandrasekaran. Good point. I will recalculate the baseline with 1324 features.

Yes it also surprised me a bit, although I cannot explain why it would be the case. Actually, I encountered the first well in these two plates which did not contain any cells at all.

niranjchandrasekaran commented 2 years ago

On checking the table in https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1054585913, I just realized that the two plates BR00113820_Redone and BR00113821_Redone have different cell seeding density compared to the other plates. So they are expected to have different number of cells.

EchteRobert commented 2 years ago

Experiment (intermediate)

The previous results showed a high non-replicate correlation and, although the replicate correlation was even higher, we would rather like to see a lower non-replicate correlation which would represent a cleaner profile or sharper contrast between replicates and non-replicates. To test this John proposed to change my current feature normalization method (zero-mean 1 standard deviation) to RobustMAD. Secondly, I doubled the batch size during training. This means that there are more negative pairs per batch (as this increases exponentially) which may push the learned profiles further apart.

Main takeaways

The increased batch size in combination with the RobustMAD normalization show that the model has an extremely hard time learning. Upon inspection of the gradients of the model, I saw that these vanished instantly with the first epochs. Returning to the original normalization removed this effect and allowed for better training.

Click here! Screen Shot 2022-03-02 at 3 01 07 PM _BR00112203 plate (previously highest PR)_ ![Stain2_BR00112203_exp2_BS128_PR](https://user-images.githubusercontent.com/62173977/156440285-3028b11b-5126-4a88-874f-b085cc7d80a5.png)
EchteRobert commented 2 years ago

Experiment 2

As RobustMAD did not do what was expected and the non-replicate correlation did not decrease either, likely due to the model not learning at all, I trained another model with the previous normalization and a higher batch size (80 instead of 128 in the previous post). I also moved to 'cleaner' data (all 'green' plates as indicated in the table here https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1054585913), which may cause the model to perform worse on the 'non-green' plates.

Main takeaways

The model is able to push non-replicate correlation down somewhat, however this comes at the cost of overfitting. The model achieves this on the training plates, but not on the validation plates. I expect that more data will be needed to achieve the best of both worlds.

Losses and PRs! Screen Shot 2022-03-02 at 4 17 39 PM _BR00112197 standard - training data_ ![Stain2_BR00112197standard_exp2_PR](https://user-images.githubusercontent.com/62173977/156450257-223b35a5-724d-4508-bc44-4dd75a7d0fd3.png) _BR00113818 - non training data_ ![Stain2_BR00113818_PR](https://user-images.githubusercontent.com/62173977/156451037-419b0ac8-0f9d-424e-82fa-ad625a697793.png)
EchteRobert commented 2 years ago

Experiment 3

In https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1054752037 I showed that the model learns to amplify the plate specific signal for the cell profiles. To counteract that a model is trained which also tries to learn across plate replicates. Additionally, one possible reason why the negative correlation has been so high so far, may be that the model learns to separate all plate information. By doing that the model automatically pushes all same plate profiles together and non-replicate profile correlation will become higher in general. Perhaps including across plate replicates will reduce this effect by fully utilizing the latent loss space.

Main takeaways

Non-replicate correlation appears to indeed decrease somewhat as expected, at least for the training plates. However, the model is overfitting very clearly and the overall performance with respect to the previous model is much lower. Decreasing the batch size and increasing the number of plates used for training does not solve this problem. I expect that the model is memorizing specific compounds, but not an aggregation method.

UMAP patterns here! _UMAP BM same plates as in https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1054752037_ ![UMAP_BM](https://user-images.githubusercontent.com/62173977/156588903-9a33d2e3-11b1-4bdc-b3ad-d035ae24b306.png) _UMAP MLP_ ![UMAP_MLP](https://user-images.githubusercontent.com/62173977/156589035-1172b03d-9fa1-4c00-bed4-242e77f9b889.png) _UMAP BM training plates_ ['BR00112197standard': 0, 'BR00112199': 1, 'BR00112197repeat': 2] ![UMAP_BM_train](https://user-images.githubusercontent.com/62173977/156589076-87e718c0-9a2f-4c6a-9d8a-54d336e7dab9.png) _UMAP MLP training plates_ ![UMAP_MLP_train](https://user-images.githubusercontent.com/62173977/156589134-32f76c83-cd9b-4ff7-9210-eb9b85ce6ac3.png)
Percent histograms here! **_Training plates_** ![Stain2_BR00112197standard_PR](https://user-images.githubusercontent.com/62173977/156589374-bffdd16d-34b4-4993-a2f0-7a1c06ae9007.png) ![Stain2_BR00112197repeat_PR](https://user-images.githubusercontent.com/62173977/156590301-72bd43d1-f614-4120-8e62-f78a9f96cdbf.png) ![Stain2_BR00112199_PR copy](https://user-images.githubusercontent.com/62173977/156590024-1d65c2d7-ab35-4137-8243-372020659ade.png) **_Test plate_** ![Stain2_BR00113818_PR](https://user-images.githubusercontent.com/62173977/156590085-20dcb957-0cb1-4d7c-b375-c3a7f572e060.png)
shntnu commented 2 years ago

As discussed with @shntnu I calculated the PC1 loadings per plate and the correlation between these loadings.

@EchteRobert Awesome! What you essentially did here was measure the distribution similarity between all pairs of plates. The first PC is a quick way to do that.

Comparing the PC1 loadings of two multivariate distributions is a shortcut for comparing the covariance matrices of the two multivariate distributions. If the distributions are truly multivariate gaussian (good luck with that, haha!), then it's actually a very good approximation (to the extent that PC1 explains a large fraction of the variance).

If you really want to go down this rabbit hole (⚠️ stop, don't ! ⚠️ ) read up

EchteRobert commented 2 years ago

Experiment 3V2

Learning from previous experiments, I used the following experiment setup:

Below I will show:

Main takeaways

PC1 loadings of the model profiles ![PC1_loadings_MLP_Stain2exp3V2](https://user-images.githubusercontent.com/62173977/157282958-c919924f-b15e-49df-81aa-3c8e4f54fc5f.png)
PR but in a new latent loss space! | **Plate** | **Percent Replicating** | |--------------------|-------------------------| | _Training_ | | | BR00112197binned | 88.9 | | BR00112199 | 91.1 | | BR00112203 | 88.9 | | BR00113818 | 84.4 | | BR00113820 | 97.8 | | _Validation_ | | | BR00112197repeat | 72.2 | | BR00112197standard | 72.2 | | BR00112198 | 63.3 | | BR00112201 | 72.2 | | BR00112202 | 56.7 | | BR00112204 | 61.1 | | BR00113819 | 67.8 | | BR00113821 | 50.0 | ![Stain2_BR00113820_PR](https://user-images.githubusercontent.com/62173977/157284712-32963efe-7c06-42a0-83a9-a7bdd0409561.png) ![Stain2_BR00113821_PR](https://user-images.githubusercontent.com/62173977/157284732-4aa314f3-d240-49e2-aa84-9a7de1f8b261.png)
A new metric approaches! _5 plates are used to train the model (as shown in the 'Plate' column). During training 80% of the compounds are used to train the model and 20% of the compounds (the same ones for each plate) are used as a hold out or validation set._ | **Plate** | **training compounds MLP** | **training compounds BM** | **validation compounds MLP** | **validation compounds BM** | |--------------------|----------------------------|---------------------------|------------------------------|-----------------------------| | _Training_ | | | | | | BR00112197binned | **0.44** | 0.41 | 0.20 | **0.30** | | BR00112199 | **0.38** | 0.32 | 0.20 | **0.28** | | BR00112203 | **0.49** | 0.30 | 0.16 | **0.27** | | BR00113818 | **0.43** | 0.28 | 0.17 | **0.30** | | BR00113820 | **0.59** | 0.30 | 0.18 | **0.30** | | _Validation_ | | | | | | BR00112197repeat | 0.29 | **0.41** | 0.25 | **0.31** | | BR00112197standard | 0.32 | **0.40** |0.27 | **0.28** | | BR00112198 | 0.27 | **0.35** | 0.26 | **0.30** | | BR00112201 | 0.26 | **0.40** | 0.22 | **0.32** | | BR00112202 | 0.25 | **0.34** | 0.24 | **0.30** | | BR00112204 | 0.24 | **0.35** | 0.25 | **0.29** | | BR00113819 | 0.24 | **0.28** | 0.17 | **0.25** | | BR00113821 | 0.19 | **0.24** | 0.12 | **0.22** |
mAP BR00112201 Plate: BR00112201 Total mean:0.25251311463707016 _Training samples mean AP: 0.259931_ | compound | AP | |:---------------------|----------:| | PF-477736 | 1 | | AMG900 | 1 | | APY0201 | 1 | | AZD2014 | 1 | | GDC-0879 | 1 | | acriflavine | 1 | | RG7112 | 0.930556 | | GSK-J4 | 0.897222 | | Compound2 | 0.830556 | | BLU9931 | 0.677167 | | BI-78D3 | 0.668651 | | SCH-900776 | 0.640873 | | CPI-0610 | 0.572222 | | SU3327 | 0.510317 | | ABT-737 | 0.480423 | | Compound7 | 0.472073 | | -GNF 5 | 0.469444 | | MK-5108 | 0.447917 | | THZ1 | 0.422808 | | NVS-PAK1-1 | 0.347374 | | SU-11274 | 0.32939 | | GW-5074 | 0.246392 | | GSK2334470 | 0.246166 | | BX-912 | 0.24095 | | NVP-AEW541 | 0.23775 | | CHIR-99021 | 0.220037 | | dosulepin | 0.202143 | | GSK-3-inhibitor-IX | 0.172313 | | PD-198306 | 0.148742 | | PFI-1 | 0.14835 | | Compound3 | 0.145067 | | BMS-566419 | 0.12329 | | BMS-863233 | 0.121743 | | apratastat | 0.118872 | | WZ4003 | 0.114163 | | ICG-001 | 0.11288 | | PNU-74654 | 0.0874405 | | ML324 | 0.0822136 | | Compound5 | 0.0819586 | | GW-3965 | 0.0698881 | | SGX523 | 0.0628168 | | AZ191 | 0.0614712 | | A-366 | 0.0492269 | | halopemide | 0.0481211 | | FR-180204 | 0.0474747 | | BIX-02188 | 0.044098 | | Compound4 | 0.0427142 | | AZD7545 | 0.0417633 | | SHP 99.00 | 0.0412191 | | RGFP966 | 0.0397035 | | IOX2 | 0.0396046 | | CP-724714 | 0.0378228 | | EPZ015666 | 0.037468 | | AMG-925 | 0.0353015 | | VX-745 | 0.0336891 | | SGC-707 | 0.0329782 | | P5091 | 0.0326774 | | Compound6 | 0.0305971 | | delta-Tocotrienol | 0.0295755 | | Compound1 | 0.0279454 | | PS178990 | 0.0278597 | | carmustine | 0.0272295 | | T-0901317 | 0.0272058 | | andarine | 0.0257093 | | UNC0642 | 0.0257052 | | dimethindene-(S)-(+) | 0.0252354 | | ML-323 | 0.0244636 | | ML-298 | 0.0232809 | | Compound8 | 0.0218036 | | SAG | 0.0198054 | | KH-CB19 | 0.0187536 | | filgotinib | 0.0143387 | _Validation samples mean AP: 0.222843_ | compound | AP | |:-------------------|----------:| | valrubicin | 0.830159 | | sirolimus | 0.647222 | | romidepsin | 0.614379 | | ponatinib | 0.489386 | | merimepodib | 0.373039 | | ispinesib | 0.357657 | | neratinib | 0.250216 | | veliparib | 0.0939503 | | orphenadrine | 0.0710256 | | ruxolitinib | 0.0683867 | | hydroxyzine | 0.0374705 | | selumetinib | 0.0353887 | | pomalidomide | 0.0339397 | | skepinone-l | 0.0242614 | | homochlorcyclizine | 0.0220177 | | rheochrysidin | 0.0216262 | | quazinone | 0.0209096 | | purmorphamine | 0.0201343 |
EchteRobert commented 2 years ago

To get an overview of all the PRs based on training/validation plates and training/validation compounds like for the mAP. Generally speaking, the PR values correlate highly with the mAP values that were reported in https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1062006634.

Excel table Screen Shot 2022-03-11 at 5 59 03 PM
EchteRobert commented 2 years ago

Experiments

The model showed in previous comments is overfitting the training dataset. This means it does not beat the baseline in mean average precision when comparing its profiles created for validation (hold-out) compounds, validation (hold-out) plates, or both. There are two main ideas to reduce overfitting on 1. plates and 2. compounds:

  1. Consider replicates across plates
  2. Aggregate all same-compound cells from wells within a plate, into a super well if you will, and then sampling new 'augmented wells' from this super well. This should increase the variability of single-cell well compositions and reduce compound overfitting. (3. A possible extension of 1. and 2. is to also merge ALL compound wells across ALL plates (to form super super wells?))

Main takeaways

I will not show the results as there are too many different experiments, but instead outline the most important findings.

Next up

A possible improvement will be to reduce the data augmentation a bit. Instead, only creating super wells 50% of the time. The other 50% sampling will be done from a single well. Additionally, super wells are created by aggregating only 2 of the 4 available wells (chosen at random). Another improvement is the normalization method. I will now normalize all wells across the entire plate before training the model on the wells. First this normalization was done per well.

EchteRobert commented 2 years ago

Experiment

Results of the 'Next up' experiment described here: https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/5#issuecomment-1071401689

Main takeaways

Next up

EXCITING! _Results in bold are the highest score_ | plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |------------------------|----------------:|----------------:|---------------:|---------------:|--------:|--------:| | _Training plates_ | | | | | | | | **BR00112201** | **0.66** | 0.40 | **0.43** | 0.32 | **98.9** | 66.7 | | **BR00112198** | **0.56** | 0.35 | **0.4** | 0.30 | **100** | 56.7 | | **BR00112204** | **0.59** | 0.35 | **0.35** | 0.29 | **100** | 58.9 | | _Validation plates_ | | | | | | | | **BR00112202** | **0.44** | 0.34 | **0.31** | 0.30 | **93.3** | 54.4 | | **BR00112197standard** | **0.47** | 0.40 | **0.34** | 0.28 | **94.4** | 56.7 | | BR00112203 | 0.19 | **0.30** | 0.21 | **0.27** | 52.2 | **56.7** | | BR00112199 | 0.3 | **0.32** | 0.23 | **0.28** | **76.7** | 57.8 | | **BR00113818** | **0.32** | 0.28 | 0.24 | **0.30** | **77.8** | 52.2 | | **BR00113819** | **0.32** | 0.28 | 0.21 | **0.25** | **70** | 48.9 | | **BR00112197repeat** | **0.47** | 0.41 | **0.37** | 0.31 | **92.2** | 63.3 | | BR00113820 | 0.27 | **0.30** | 0.24 | **0.30** | **58.9** | 55.6 | | BR00113821 | 0.15 | **0.24** | 0.16 | **0.22** | 38.9 | **47.8** | | **BR00112197binned** | **0.41** | 0.41 | **0.34** | 0.30 | **91.1** | 58.9 |
shntnu commented 2 years ago

👀 🎊

EchteRobert commented 2 years ago

Experiment

Building upon the setup in the previous experiment I now train and evaluate a model on across plate compound replicates. The training set consists of the same 3 plates: BR00112201, BR00112198, and BR00112204. The validation set contains only the BR00112202, BR00112197standard, BR00113818, BR00113819, BR00112197repeat, and BR00112197binned. Note that I am only selecting the plates here that are close to the training sets, this is because I am considering across plate correlations and the other 4 outlier plates look at different features. I group the outlier plates in a separate validation set and compute the results for this set for completeness sake, but I do not think this last set is useful for analysis due to their different feature importances.

I compute the baseline mAP (and PR) using the mean aggregation method for these two sets with across plate replicates of compounds, and do the same using the model aggregation method.

Main takeaways

Next up

CrissCross mAP🔀 _Across plate compound correlations_ -- I do not report the PR, because all of these are (close to) 100 percent. I expect this to be due to the high number of replicates that are now being considered (perhaps I need to increase the number of samples used for the non-replicate correlation calculation?). -- | plate set | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | |----------------|-------------------:|----------------:|---------------------:|------------------:| | Training set | **0.48** | 0.30 | **0.35** | 0.30 | | Validation set | **0.31** | 0.23 | **0.28** | 0.21 | | Outlier set | 0.11 | **0.15** | 0.09 | **0.13** | _Within plate compound correlations_ | plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112201 | **0.58** | 0.4 | **0.37** | 0.32 | **98.9** | 66.7 | | BR00112198 | **0.53** | 0.35 | **0.34** | 0.3 | **97.8** | 56.7 | | BR00112204 | **0.53** | 0.35 | **0.35** | 0.29 | **98.9** | 58.9 | | _Validation plates_ | | | | | | | | **BR00112202** | **0.43** | 0.34 | **0.36** | 0.3 | **88.9** | 54.4 | | **BR00112197standard** | **0.46** | 0.4 | **0.39** | 0.28 | **92.2** | 56.7 | | BR00112203 | 0.18 | **0.3** | 0.16 | **0.27** | 48.9 | **56.7** | | BR00112199 | 0.28 | **0.32** | 0.18 | **0.28** | **68.9** | 57.8 | | BR00113818 | 0.26 | **0.28** | 0.26 | **0.3** | **70** | 52.2 | | BR00113819 | 0.25 | **0.28** | 0.19 | **0.25** | **72.2** | 48.9 | | **BR00112197repeat** | **0.44** | 0.41 | **0.36** | 0.31 | **86.7** | 63.3 | | BR00113820 | 0.25 | **0.3** | 0.2 | **0.3** | **64.4** | 55.6 | | BR00113821 | 0.17 | **0.24** | 0.18 | **0.22** | 45.6 | **47.8** | | BR00112197binned | 0.41 | 0.41 | **0.4** | 0.3 | **88.9** | 58.9 |
EchteRobert commented 2 years ago

Experiment

To see if my hypothesis* is true, I trained a model on 2 of the outlier plates (BR00113819 and BR00113821). I then calculated the same performance metrics as before. The model was trained without creating pairs across plates, only within each plate.

*Training on plates which are similar according to the PC1 loadings plot, will lead to poor performance of the model on plates which are dissimilar to the training plates.

Main takeaways

Next up

Time to evaluate on Stain3.

TableTime! | plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00113819 | **0.58** | 0.28 | **0.28** | 0.25 | **97.8** | 48.9 | | BR00113821 | **0.59** | 0.24 | 0.22 | 0.22 | **96.7** | 47.8 | | _Validation plates_ | | | | | | | | BR00112202 | 0.33 | **0.34** | **0.34** | 0.3 | **80** | 54.4 | | BR00112197standard | 0.32 | 0.4 | **0.34** | 0.28 | **78.9** | 56.7 | | BR00112203 | 0.16 | **0.3** | 0.18 | **0.27** | 38.9 | **56.7** | | BR00112199 | 0.17 | **0.32** | 0.16 | **0.28** | 40 | **57.8** | | BR00113818 | **0.35** | 0.28 | 0.24 | **0.3** | **76.7** | 52.2 | | BR00112198 | 0.27 | **0.35** | 0.28 | **0.3** | **66.7** | 56.7 | | BR00112197repeat | 0.33 | **0.41** | **0.34** | 0.31 | **70** | 63.3 | | BR00112204 | 0.28 | **0.35** | **0.35** | 0.29 | **66.7** | 58.9 | | BR00113820 | **0.36** | 0.3 | 0.25 | **0.3** | **84.4** | 55.6 | | BR00112197binned | 0.28 | **0.41** | 0.3 | 0.3 | **65.6** | 58.9 | | BR00112201 | 0.38 | **0.4** | **0.34** | 0.32 | **86.7** | 66.7 |
EchteRobert commented 2 years ago

Evaluation

As an additional evaluation at the compound level, I compared the mAP between the model and the benchmark for the 'within cluster plates' (see PC1 loadings plot for the cluster) to see if there are specific compounds which consistently perform worse or better than the benchmark while using the model.

Colorful bubble graph training compounds! Screen Shot 2022-03-31 at 4 47 28 PM
Colorful bubble graph validation compounds! Screen Shot 2022-03-31 at 4 49 44 PM
EchteRobert commented 2 years ago

Evaluation Stain3 optimized model

After tuning a bunch of hyperparameters using Stain3 plates I trained a model on Stain2 plates using the same hyperparameters and training methods to see if this new setup is compatible across plates. I changed the data that is used to calculate the validation loss, so that selecting the best validation loss model will actually yield the best performance on the validation compounds. See https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/6#issuecomment-1095241531 for the finding of this validation loss issue and https://github.com/broadinstitute/FeatureAggregation_single_cell/issues/6#issuecomment-1095206104 for the hyperparameter experiment details.

Main takeaways

Results

mAP table with last epoch model here! | plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112201 | **0.81** | 0.4 | **0.47** | 0.32 | 100 | 66.7 | | BR00112198 | **0.78** | 0.35 | **0.49** | 0.3 | 100 | 56.7 | | BR00112204 | **0.82** | 0.35 | **0.42** | 0.29 | 100 | 58.9 | | _Validation plates_ | | | | | | | | BR00112202 | **0.52** | 0.34 | **0.35** | 0.3 | 94.4 | 54.4 | | BR00112197standard | **0.54** | 0.4 | **0.44** | 0.28 | 95.6 | 56.7 | | BR00112197repeat | **0.55** | 0.41 | **0.4** | 0.31 | 95.6 | 63.3 | | BR00112197binned | **0.48** | 0.41 | **0.41** | 0.3 | 91.1 | 58.9 |
mAP table with best validation loss model here! Numbers in bold are **better** than the last epoch model. Numbers in italic are _worse_. | plate | Training mAP model | Training mAP BM | Validation mAP model | Validation mAP BM | PR model | PR BM | |:-------------------|---------------------:|------------------:|-----------------------:|--------------------:|-----------:|--------:| | _Training plates_ | | | | | | | | BR00112201 | 0.65 | 0.4 | _0.45_ | 0.32 | 98.9 | 66.7 | | BR00112198 | 0.59 | 0.35 | 0.49 | 0.3 | 98.9 | 56.7 | | BR00112204 | 0.59 | 0.35 | **0.46** | 0.29 | 100 | 58.9 | | _Validation plates_ | | | | | | | | BR00112202 | 0.48 | 0.34 | **0.37** | 0.3 | 95.6 | 54.4 | | BR00112197standard | 0.51 | 0.4 | 0.44 | 0.28 | 93.3 | 56.7 | | BR00112197repeat | 0.49 | 0.41 | **0.47** | 0.31 | 93.3 | 63.3 | | BR00112197binned | 0.46 | 0.41 | 0.41 | 0.3 | 85.6 | 58.9 |