Drop invalid cube values in memento builder

As of 8f3ed5fa0fc05dad381adda79e2cf502fe9e43bc, the diff expr method drops cube values where (estimators_df["sem"] <= 0) | (estimators_df["sem"] >= estimators_df["mean"]). This pruning prevents mathematical errors when transforming the estimators into log space by avoiding taking log(mean) whenmean <=0and whenlog(mean - sem)whenmean - sem <= 0`.

This drops ~1-3% of cube data, depending upon the query filter specified by the user.

The zero-valued sem cube elements constitute 1.7% of the cube, and are computed from n_obs counts ranging between 1-14:

count    1.875726e+07
mean     1.107691e+00
std      3.761035e-01
min      1.000000e+00
25%      1.000000e+00
50%      1.000000e+00
75%      1.000000e+00
max      1.400000e+01

The sem>mean cube elements constitute 0.1% of the cube, and are computed from n_obs counts ranging primarily between 1-5:

count    1.061575e+06
mean     6.328915e+00
std      1.474041e+01
min      2.000000e+00
25%      2.000000e+00
50%      3.000000e+00
75%      5.000000e+00
max      2.726000e+03

Since these estimator values are computed from low counts of raw expression values, it has been deemed acceptable to drop these values entirely:

[ ] Update the builder to filter out these values. (Note: they will be replaced with minimal values in the diff expr method.
[ ] Add a post-build validation check here.
[ ] Delete the filtering logic in the diff expr method.

chanzuckerberg / cellxgene-census

Drop invalid cube values in memento builder #940