Open zharmer opened 3 years ago
@zharmer: Could you add a comment with the complete error message? Based on the files you attached, I think the script in the example notebook should work - since the code just looks for the strings "bead" or "blank" in the file names. So, there must be something else going wrong. If you send the error message, it can tell me which line of the code is throwing the error.
Thanks-
For the blanks error in Notebook 1:
The line batch_p.fcs_to_dataframe(data_directory=data_directory)
returns:
['Experiment_Group_B1.fcs',
'Experiment_Group_B2.fcs',
'Experiment_Group_B3.fcs',
None,
'Attune_beads.fcs']
(Notice the "blanks" sample has been removed)
Next line:
%%time
batch_p.background_subtract_gating(data_directory=data_directory,
top_directory='Documents',
num_cell_clusters=3)
Output:
IndexError Traceback (most recent call last)
<timed eval> in <module>
c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in background_subtract_gating(data_directory, top_directory, samples, back_file, num_back_clusters, num_cell_clusters, back_init, random_cell_inits, ssc_back_cutoff, fsc_back_cutoff, num_memory_inits, save_analysis_memory, cell_type, max_events, init_events, update_progress, show_plots, x_channel, y_channel, trim_alpha)
600 # It is best to use a buffer blank measured before any cell samples
601 if back_file is None:
--> 602 back_file = blank_file_list[0]
603
604 back_data = pickle.load(open(back_file, 'rb'))
IndexError: list index out of range
For the beads error in Notebook 2: Input:
batch_p.fit_bead_data(data_directory=data_directory,
num_bead_populations=6,
bead_init=10,
singlet_init=5,
bead_population_init=10)
Output:
Main bead cluster mean: [115014.36609572 646489.74584853]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-7-d2b68287ae2f> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
2 num_bead_populations=6,
3 bead_init=10,
4 singlet_init=5,
5 bead_population_init=10)
c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
2942 singlet_select_frame = singlet_select_frame[singlet_select_frame['mean']>singlet_low]
2943 singlet_select_frame = singlet_select_frame[singlet_select_frame['mean']<singlet_high]
-> 2944 singlet_cluster = singlet_select_frame.index[0]
2945
2946 bead_data.flow_frame['is_singlet_cluster'] = (bead_data.flow_frame['is_main_cluster']) & (singlet_fit.predict(bead_data.flow_frame.loc[:, [SSCwidth_channel, SSC_channel]]) == singlet_cluster)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __getitem__(self, key)
4295 if is_scalar(key):
4296 key = com.cast_scalar_indexer(key, warn_float=True)
-> 4297 return getitem(key)
4298
4299 if isinstance(key, slice):
IndexError: index 0 is out of bounds for axis 0 with size 0
@zharmer:
Ok. If you don't specify the blank file, the fcs_to_dataframe()
function attempts to automatically identify the last blank dataset that was measured before the first cell sample. If the data acquisition times are not saved in the fcs files as expected, or if you ran the blank sample after some of the cell samples, that automated algorithm will fail. I think that is the problem.
You can solve the problem by specifying the blank file with the optional blank_file
argument:
batch_p.fcs_to_dataframe(data_directory=data_directory, blank_file="Attune_blanks.fcs")
The problem with the fit_bead_data()
function is different.
In that case, the algorithm automatically identifies singlet bead events by looking for a cluster of events with a mean SSC width between singlet_low
and singlet_high
(and with a variance in the SSC width direction less than the covariance
parameter).
This is a recent change we made to the package to enable more flexible use with data from different instruments.
To fix this problem, you need to examine a scatter plot of SSC-H vs. SSC-W and find the cluster of events that corresponds to singlet beads.
In the example notebook, this is the set of plots under the label, "Second GMM fit results to find singlet bead events." Note that in that example, the singlet cluster is the narrow cluster plotted with purple points with a mean SSC-W of about 400.
Then, you need to set the optional singlet_low
and singlet_high
parameters to bracket the mean SSC-W value that results for your beads and your instrument.
Thank you, specifying the blank_file
argument worked for processing the blank data!
I set the singlet parameters to bracket the mean SSC-W value, but it returned a Key Error. I only collected BL1 and YL2 for my samples. Could you suggest a way to process the beads data without YL1 or VL1 or is that impossible?
Here is the error I get from the fit_bead_data()
function:
Input:
batch_p.fit_bead_data(data_directory=data_directory,
num_bead_populations=6,
bead_init=10,
singlet_init=5,
singlet_low=50,
singlet_high=200,
bead_population_init=10)
Output:
Main bead cluster mean: [115014.37642124 646489.6659989 ]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-15-e2f81fe389d5> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
2 num_bead_populations=6,
3 bead_init=10,
4 singlet_init=5,
5 singlet_low=50,
c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
2990 gmm_data = gmm_data[gmm_data[fluoro_channel_3] < upper_threshold[2]]
2991 else:
-> 2992 gmm_data = singlet_beads_frame.loc[:, [fluoro_channel_1, fluoro_channel_2]].copy()
2993
2994 channel_max_b = gmm_data[fluoro_channel_1].max()
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
887 # AttributeError for IntervalTree get_value
888 return self.obj._get_value(*key, takeable=self._takeable)
--> 889 return self._getitem_tuple(key)
890 else:
891 # we by definition only have the 0th axis
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
1067 return self._multi_take(tup)
1068
-> 1069 return self._getitem_tuple_same_dim(tup)
1070
1071 def _get_label(self, label, axis: int):
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple_same_dim(self, tup)
773 continue
774
--> 775 retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
776 # We should never have retval.ndim < self.ndim, as that should
777 # be handled by the _getitem_lowerdim call above.
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1111 raise ValueError("Cannot index with multidimensional key")
1112
-> 1113 return self._getitem_iterable(key, axis=axis)
1114
1115 # nested tuple slicing
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
1051
1052 # A collection of keys
-> 1053 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
1054 return self.obj._reindex_with_indexers(
1055 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1264 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1265
-> 1266 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1267 return keyarr, indexer
1268
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1319
1320 with option_context("display.max_seq_items", 10, "display.width", 80):
-> 1321 raise KeyError(
1322 "Passing list-likes to .loc or [] with any missing labels "
1323 "is no longer supported. "
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['YL1-A'], dtype='object'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
@zharmer
In the fit_bead_data()
function, try setting the optional parameter fluoro_channel_2
to YL2-A
So:
batch_p.fit_bead_data(data_directory=data_directory,
num_bead_populations=6,
bead_init=10,
singlet_init=5,
singlet_low=50,
singlet_high=200,
bead_population_init=10,
fluoro_channel_2="YL2-A")
@zharmer If my last suggestion works, you'll also have to change the line in the example notebook where the bead calibration is applied to the data (input line 10). So, instead of:
batch_p.batch_apply_bead_cal(data_directory=data_directory, fl_channel='YL1-A')
you will need:
batch_p.batch_apply_bead_cal(data_directory=data_directory, fl_channel='YL2-A')
Changing the fluoro_channel_2
to YL2-A
doesn't quite seem to have been sufficient. With that input I'm now getting a Value Error:
ValueError Traceback (most recent call last)
<ipython-input-16-15d57a5bae76> in <module>
----> 1 batch_p.fit_bead_data(data_directory=data_directory,
2 num_bead_populations=6,
3 bead_init=10,
4 singlet_init=5,
5 singlet_low=50,
c:\users\zharmer\documents\python_scripts\flowgatenist\flowgatenist\batch_process.py in fit_bead_data(data_directory, bead_file, num_bead_clusters, bead_init, num_singlet_clusters, singlet_init, num_bead_populations, bead_population_init, pop_init_means, pop_init_cov, auto_1D_fit, num_1D_clusters, max_cytometer_signal, outlier_quantile, show_plots, update_progress, debug, ssc_min, fsc_min, ssc_max, fsc_max, skip_low_beads, lower_threshold, upper_threshold, fit_VL1, manuscript_style, sub_fig_lower, FSC_channel, SSC_channel, SSCwidth_channel, covariance, singlet_low, singlet_high, fluoro_channel_1, fluoro_channel_2, fluoro_channel_3)
3012 n_init=bead_population_init)
3013
-> 3014 gmm.fit(gmm_data)
3015
3016 # Mark outlier data that will be ignored in calibration
~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in fit(self, X, y)
191 self
192 """
--> 193 self.fit_predict(X, y)
194 return self
195
~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in fit_predict(self, X, y)
218 Component labels.
219 """
--> 220 X = _check_X(X, self.n_components, ensure_min_samples=2)
221 self._check_n_features(X, reset=True)
222 self._check_initial_parameters(X)
~\Anaconda3\lib\site-packages\sklearn\mixture\_base.py in _check_X(X, n_components, n_features, ensure_min_samples)
50 X : array, shape (n_samples, n_features)
51 """
---> 52 X = check_array(X, dtype=[np.float64, np.float32],
53 ensure_min_samples=ensure_min_samples)
54 if n_components is not None and X.shape[0] < n_components:
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
667 n_samples = _num_samples(array)
668 if n_samples < ensure_min_samples:
--> 669 raise ValueError("Found array with %d sample(s) (shape=%s) while a"
670 " minimum of %d is required%s."
671 % (n_samples, array.shape, ensure_min_samples,
ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 2 is required.
@zharmer Sorry this took me a while to get back to- I looked at your bead data file, and it looks like you have the SSC channel gain set way too high. Almost all of the SSC-H values in the data are at the upper detection limit (~10^6). So, you can't use the scattering data to gate the events. I expect that you have a similar problem with your blanks and cell samples as well: most of the events in your blank data are also at the upper detection limit for the SSC-H channel.
Ok, I'll collect a new data set with a lower SSC gain and try running the script again on that. Thank you so much for the advice!
@zharmer Just remember to adjust the SSC and FSC gain settings while running the beads: you want to get the beads on scale, but near the top end of the scale - since the cells you want to measure (E. coli?) are probably smaller than the beads. You might also want to check the gain settings with a cell sample before locking it down and getting your real data. See the example SSC vs. FSC plots in the notebooks in this repo to get an idea of what it might look like when the gains are set well.
Hello,
The script doesn't seem to be able to read the blank and bead measurements taken on a ThermoFisher Attune NxT Flow Cytometer.
The line
batch_p.fcs_to_dataframe(data_directory=data_directory)
in Notebook 1 returns all of the samples except the "blanks" samples are returned as "None," leading to an Index Error in the next line of the script.An Index Error is similarly returned from the beads (cat: RCP-30-5, 6 peaks) sample by the line in Notebook 2 that reads:
Can you suggest a way to read these files with the script?
Attune_blanks.zip Attune_beads.zip
Thank you!