Closed DillonHammill closed 3 years ago
Here is comparison of cyto_sample()
using a sampleFilter
or using sampled indices:
cf <- flowFrame_to_cytoframe(as(cs, "flowFrame"))
# SAMPLE FILTER
system.time({
cyto_sample_v1(cf,
200000,
seed = 56)
})
user system elapsed
0.22 0.28 0.50
# SAMPLE INDICES
system.time({
cyto_sample_v2(cf,
200000,
seed = 56)
})
user system elapsed
0.25 0.24 0.48
Looks like there are marginal benefits (if any) of using sampled row indices instead of sampleFilter
. I will leave cyto_sample()
alone for now.
I will close this for now as cyto_coerce()
will now be used where possible, particularly in cyto_merge_by()
. This should offer substantial speed improvements to cyto_plot()
and cyto_gate_draw()
once implemented.
Currently, CytoExploreR handles sampling and coercion of
cytosets
independently. This means that the data is first coerced tocytoframe
and then sampled downstream usingcyto_sample()
. The initial coercion step is by far the most computationally taxing step so we need to consider sampling prior to merging to sped up this process.Preparation of data:
If we use the current approach of coercion and then sampling (total events = 50000):
If we sample each cytoframe prior to merging (total events ~ 50000):
If we sample each cytoframe, merge and then sample again (total events = 50000):
Sampling each cytoframe prior to merging does offer a significant speed boost but there will not be exactly 50000 events in the merged sample due to rounding (1515 * 33 = 49995). Exact counts can be obtained by sampling slightly more events per cytoframe and the sampling the merged cytoframe to the desired number of events. Interestingly this double sampling approach is still significantly faster than merging and sampling afterwards.
I have written a new
cyto_coerce()
function to implement the second approach for now.Perhaps a better approach would be to use the indices directly and remove
sampleFilter()
completely fromcyto_sample()
. I will give this a try and report back.