angelolab / Nimbus

Other
12 stars 1 forks source link

Added predict_data_list function to model_builder #77

Closed JLrumberger closed 1 year ago

JLrumberger commented 1 year ago

What is the purpose of this PR?

Added functionality to automatically run inference on lists of datasets, a script to generate test set predictions and fixed a bug in ModelBuilder.dset_marker_filter.

How did you implement your changes

  1. Add function ModelBuilder.predict_data_list that takes in a list of tf.data.Datasets, predicts Nimbus scores and calculates per-cell Nimbus scores along with dataset, fov, cell ID, cell type, marker and silver standard labels and saves this as a .csv
  2. Added a script hyperparameter_search.py to use the above function to calculate validation predictions, calculate f1 scores for different pos/neg thresholds (individually for every marker in every dataset) and save the optimal thresholds. Then it runs inference on the test set and uses the found thresholds to assign pos/neg class.
  3. Fixed a bug in ModelBuilder.dset_marker_filter. This function filters the dataset and throws out samples with a specific dataset and marker combination (to exclude the two falsely silver standard labeled channels in the decidua dataset). The bug was that the predicate for filtering compared a byte string b'CD4' within a tensor with a regular string 'CD4' which always evaluated as False. This didn't come up in the tests, because I didn't correctly name the test function with prefix test so pytest didn't execute it. Now both mistakes are fixed.

Remaining issues

None

ngreenwald commented 1 year ago

Closer and closer!