RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
77 stars 29 forks source link

Improving documentation for `tailgate()` #191

Closed ptvan closed 4 years ago

ptvan commented 5 years ago

Recording a documentation request from a user:

Miguel Rodo [8:33 AM] So, it seems to me, based on reading the function, that it:

  1. Finds all local maxima. -- Of these, chooses by default the lowest one (in the sense of the value along the x-axis is lowest). Let's call this max.
  2. Finds all local minima. -- Selects only local minima greater than the local maximum selected above. -- Finds the local minimum closest to max. Let's call this min.
  3. Finds all points that are both greater than min and that have their derivative less than tol. Let's call this set cutpoint_set. (tol is by default 0.01 or else is a user-specified value. A lower value of tol allows fewer points, and so when combined with step 4 below, essentially pushes the cutpoint closer to the reference peak).
  4. Chooses the lowest value in cutpoint_set.

Miguel Rodo [8:58 AM] So, it essentially places the cutpoint right after the local minimum closest to the reference peak. I think the function description benefit from adding this, as currently it could be more specific ('Gates the tail of a density using the derivative of a kernel density estimate').

jacobpwagner commented 5 years ago

Agreed. This documentation could use some elaboration. Additionally, some potentially useful args (including adjust for altering the KDE smoothing bandwidth) are left out of the docs. Some of this is documented in the source for .cytokine_cutpoint() and could be added to the tailgate() doc, but we should make this less opaque for users. @gfinak, @mikejiang , I'm happy to take this unless you'd rather doc it yourselves.

gfinak commented 5 years ago

Thanks, Jacob, go ahead.

jacobpwagner commented 5 years ago

To respond to the original message, I believe Miguel Rodo's reading of the function is a little incorrect but totally understandably given the comments in the source. For the first derivative method (the default, which he seems to be discussing), it doesn't find a local minimum to the right of the peak in the original density (which may not even exist). It finds a local minimum in the first derivative of the KDE of the density to find the steep decrease of the right shoulder of the reference peak. Then, a lower value for tol will force the derivative to be smaller in magnitude, pushing the cutpoint away from the peak and on to the flatter tail.

jacobpwagner commented 5 years ago

As @ptvan suggested, I'll probably add a quick plotted example to the vignette to help clarify things. high_tol low_tol

jacobpwagner commented 5 years ago

Expanded doc in 4a6f91522471f2b4f67f46d6f01003d26fa2b331. I'll add a bit of explanation to the vignette soon as well.