djross22 / flowgatenist

Other
3 stars 1 forks source link

Attune NxT files incompatible? #1

Open zharmer opened 3 years ago

zharmer commented 3 years ago

Hello,

The script doesn't seem to be able to read the blank and bead measurements taken on a ThermoFisher Attune NxT Flow Cytometer.

The line batch_p.fcs_to_dataframe(data_directory=data_directory) in Notebook 1 returns all of the samples except the "blanks" samples are returned as "None," leading to an Index Error in the next line of the script.

An Index Error is similarly returned from the beads (cat: RCP-30-5, 6 peaks) sample by the line in Notebook 2 that reads:

batch_p.fit_bead_data(data_directory=data_directory,
                      num_bead_populations=6,
                      bead_init=10,
                      singlet_init=5,
                      bead_population_init=10)

Can you suggest a way to read these files with the script?

Attune_blanks.zip Attune_beads.zip

Thank you!

djross22 commented 3 years ago

@zharmer: Could you add a comment with the complete error message? Based on the files you attached, I think the script in the example notebook should work - since the code just looks for the strings "bead" or "blank" in the file names. So, there must be something else going wrong. If you send the error message, it can tell me which line of the code is throwing the error.

Thanks-

zharmer commented 3 years ago

For the blanks error in Notebook 1:

The line batch_p.fcs_to_dataframe(data_directory=data_directory) returns:

['Experiment_Group_B1.fcs',
 'Experiment_Group_B2.fcs',
 'Experiment_Group_B3.fcs',
 None,
 'Attune_beads.fcs']

(Notice the "blanks" sample has been removed)

Next line:

%%time
batch_p.background_subtract_gating(data_directory=data_directory,
                                   top_directory='Documents',
                                   num_cell_clusters=3)

Output:

IndexError                                Traceback (most recent call last)
<timed eval> in <module>

c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in background_subtract_gating(data_directory, top_directory, samples, back_file, num_back_clusters, num_cell_clusters, back_init, random_cell_inits, ssc_back_cutoff, fsc_back_cutoff, num_memory_inits, save_analysis_memory, cell_type, max_events, init_events, update_progress, show_plots, x_channel, y_channel, trim_alpha)
    600     # It is best to use a buffer blank measured before any cell samples
    601     if back_file is None:
--> 602         back_file = blank_file_list[0]
    603 
    604     back_data = pickle.load(open(back_file, 'rb'))

IndexError: list index out of range

For the beads error in Notebook 2: Input:

batch_p.fit_bead_data(data_directory=data_directory,
                      num_bead_populations=6,
                      bead_init=10,
                      singlet_init=5,
                      bead_population_init=10)

Output:

Main bead cluster mean: [115014.36609572 646489.74584853]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-d2b68287ae2f> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
      2                       num_bead_populations=6,
      3                       bead_init=10,
      4                       singlet_init=5,
      5                       bead_population_init=10)

c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
   2942     singlet_select_frame = singlet_select_frame[singlet_select_frame['mean']>singlet_low]
   2943     singlet_select_frame = singlet_select_frame[singlet_select_frame['mean']<singlet_high]
-> 2944     singlet_cluster = singlet_select_frame.index[0]
   2945 
   2946     bead_data.flow_frame['is_singlet_cluster'] = (bead_data.flow_frame['is_main_cluster']) & (singlet_fit.predict(bead_data.flow_frame.loc[:, [SSCwidth_channel, SSC_channel]]) == singlet_cluster)

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __getitem__(self, key)
   4295         if is_scalar(key):
   4296             key = com.cast_scalar_indexer(key, warn_float=True)
-> 4297             return getitem(key)
   4298 
   4299         if isinstance(key, slice):

IndexError: index 0 is out of bounds for axis 0 with size 0
djross22 commented 3 years ago

@zharmer: Ok. If you don't specify the blank file, the fcs_to_dataframe() function attempts to automatically identify the last blank dataset that was measured before the first cell sample. If the data acquisition times are not saved in the fcs files as expected, or if you ran the blank sample after some of the cell samples, that automated algorithm will fail. I think that is the problem. You can solve the problem by specifying the blank file with the optional blank_file argument: batch_p.fcs_to_dataframe(data_directory=data_directory, blank_file="Attune_blanks.fcs")

The problem with the fit_bead_data() function is different. In that case, the algorithm automatically identifies singlet bead events by looking for a cluster of events with a mean SSC width between singlet_low and singlet_high (and with a variance in the SSC width direction less than the covariance parameter). This is a recent change we made to the package to enable more flexible use with data from different instruments.

To fix this problem, you need to examine a scatter plot of SSC-H vs. SSC-W and find the cluster of events that corresponds to singlet beads. In the example notebook, this is the set of plots under the label, "Second GMM fit results to find singlet bead events." Note that in that example, the singlet cluster is the narrow cluster plotted with purple points with a mean SSC-W of about 400. Then, you need to set the optional singlet_low and singlet_high parameters to bracket the mean SSC-W value that results for your beads and your instrument.

zharmer commented 3 years ago

Thank you, specifying the blank_file argument worked for processing the blank data!

I set the singlet parameters to bracket the mean SSC-W value, but it returned a Key Error. I only collected BL1 and YL2 for my samples. Could you suggest a way to process the beads data without YL1 or VL1 or is that impossible?

Here is the error I get from the fit_bead_data() function:

Input:

batch_p.fit_bead_data(data_directory=data_directory,
                      num_bead_populations=6,
                      bead_init=10,
                      singlet_init=5,
                      singlet_low=50,
                      singlet_high=200,
                      bead_population_init=10)

Output:

Main bead cluster mean: [115014.37642124 646489.6659989 ]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-15-e2f81fe389d5> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
      2                       num_bead_populations=6,
      3                       bead_init=10,
      4                       singlet_init=5,
      5                       singlet_low=50,

c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
   2990             gmm_data = gmm_data[gmm_data[fluoro_channel_3] < upper_threshold[2]]
   2991     else:
-> 2992         gmm_data = singlet_beads_frame.loc[:, [fluoro_channel_1, fluoro_channel_2]].copy()
   2993 
   2994     channel_max_b = gmm_data[fluoro_channel_1].max()

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    887                     # AttributeError for IntervalTree get_value
    888                     return self.obj._get_value(*key, takeable=self._takeable)
--> 889             return self._getitem_tuple(key)
    890         else:
    891             # we by definition only have the 0th axis

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
   1067             return self._multi_take(tup)
   1068 
-> 1069         return self._getitem_tuple_same_dim(tup)
   1070 
   1071     def _get_label(self, label, axis: int):

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple_same_dim(self, tup)
    773                 continue
    774 
--> 775             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    776             # We should never have retval.ndim < self.ndim, as that should
    777             #  be handled by the _getitem_lowerdim call above.

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1111                     raise ValueError("Cannot index with multidimensional key")
   1112 
-> 1113                 return self._getitem_iterable(key, axis=axis)
   1114 
   1115             # nested tuple slicing

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
   1051 
   1052         # A collection of keys
-> 1053         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1054         return self.obj._reindex_with_indexers(
   1055             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1264             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1265 
-> 1266         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1267         return keyarr, indexer
   1268 

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1319 
   1320             with option_context("display.max_seq_items", 10, "display.width", 80):
-> 1321                 raise KeyError(
   1322                     "Passing list-likes to .loc or [] with any missing labels "
   1323                     "is no longer supported. "

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['YL1-A'], dtype='object'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
djross22 commented 3 years ago

@zharmer In the fit_bead_data() function, try setting the optional parameter fluoro_channel_2 to YL2-A So:

batch_p.fit_bead_data(data_directory=data_directory,
                      num_bead_populations=6,
                      bead_init=10,
                      singlet_init=5,
                      singlet_low=50,
                      singlet_high=200,
                      bead_population_init=10,
                      fluoro_channel_2="YL2-A")
djross22 commented 3 years ago

@zharmer If my last suggestion works, you'll also have to change the line in the example notebook where the bead calibration is applied to the data (input line 10). So, instead of:

batch_p.batch_apply_bead_cal(data_directory=data_directory, fl_channel='YL1-A')

you will need:

batch_p.batch_apply_bead_cal(data_directory=data_directory, fl_channel='YL2-A')
zharmer commented 3 years ago

Changing the fluoro_channel_2 to YL2-A doesn't quite seem to have been sufficient. With that input I'm now getting a Value Error:

ValueError                                Traceback (most recent call last)
<ipython-input-16-15d57a5bae76> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
      2                       num_bead_populations=6,
      3                       bead_init=10,
      4                       singlet_init=5,
      5                       singlet_low=50,

c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
   3012                                        n_init=bead_population_init)
   3013 
-> 3014         gmm.fit(gmm_data)
   3015 
   3016         # Mark outlier data that will be ignored in calibration

~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in fit(self, X, y)
    191         self
    192         """
--> 193         self.fit_predict(X, y)
    194         return self
    195 

~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in fit_predict(self, X, y)
    218             Component labels.
    219         """
--> 220         X = _check_X(X, self.n_components, ensure_min_samples=2)
    221         self._check_n_features(X, reset=True)
    222         self._check_initial_parameters(X)

~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in _check_X(X, n_components, n_features, ensure_min_samples)
     50     X : array, shape (n_samples, n_features)
     51     """
---> 52     X = check_array(X, dtype=[np.float64, np.float32],
     53                     ensure_min_samples=ensure_min_samples)
     54     if n_components is not None and X.shape[0] < n_components:

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    667         n_samples = _num_samples(array)
    668         if n_samples < ensure_min_samples:
--> 669             raise ValueError("Found array with %d sample(s) (shape=%s) while a"
    670                              " minimum of %d is required%s."
    671                              % (n_samples, array.shape, ensure_min_samples,

ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 2 is required.
djross22 commented 2 years ago

@zharmer Sorry this took me a while to get back to- I looked at your bead data file, and it looks like you have the SSC channel gain set way too high. Almost all of the SSC-H values in the data are at the upper detection limit (~10^6). So, you can't use the scattering data to gate the events. I expect that you have a similar problem with your blanks and cell samples as well: most of the events in your blank data are also at the upper detection limit for the SSC-H channel.

zharmer commented 2 years ago

Ok, I'll collect a new data set with a lower SSC gain and try running the script again on that. Thank you so much for the advice!

djross22 commented 2 years ago

@zharmer Just remember to adjust the SSC and FSC gain settings while running the beads: you want to get the beads on scale, but near the top end of the scale - since the cells you want to measure (E. coli?) are probably smaller than the beads. You might also want to check the gain settings with a cell sample before locking it down and getting your real data. See the example SSC vs. FSC plots in the notebooks in this repo to get an idea of what it might look like when the gains are set well.