posterior_predictive and log_likelihood could have sample names

mortonjt commented 3 years ago

Here is an example of an InferenceData object

print(repr(combined['posterior_predictive']))
<xarray.Dataset>
Dimensions:          (chain: 1, draw: 1000, features: 2055, y_predict_dim_0: 381)
Coordinates:
  * chain            (chain) int64 0
  * draw             (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999
  * y_predict_dim_0  (y_predict_dim_0) int64 0 1 2 3 4 5 ... 376 377 378 379 380
  * features         (features) object 'EC:1.1.1.1' ... 'EC:6.6.1.2'
Data variables:
    y_predict        (features, chain, draw, y_predict_dim_0) float64 ...
Attributes:
    created_at:                 2021-05-01T19:35:39.705402
    arviz_version:              0.11.2
    inference_library:          cmdstanpy
    inference_library_version:  0.9.68

print(repr(combined['log_likelihood']))
<xarray.Dataset>
Dimensions:          (chain: 1, draw: 1000, features: 2055, log_lhood_dim_0: 381)
Coordinates:
  * chain            (chain) int64 0
  * draw             (draw) int64 0 1 2 3 4 5 6 ... 993 994 995 996 997 998 999
  * log_lhood_dim_0  (log_lhood_dim_0) int64 0 1 2 3 4 5 ... 376 377 378 379 380
  * features         (features) object 'EC:1.1.1.1' ... 'EC:6.6.1.2'
Data variables:
    log_lhood        (features, chain, draw, log_lhood_dim_0) float64 ...
Attributes:
    created_at:                 2021-05-01T19:35:39.710798
    arviz_version:              0.11.2
    inference_library:          cmdstanpy
    inference_library_version:  0.9.68

Here, y_predict_dim_0 and log_lhood_dim_0 correspond to biological sample names, but that information is lost when the InferenceData object is created. It would be super useful if the sample information is retained. I think should boil down to modifying this block of code with


     if log_likelihood is not None:
         ll_ds = xr.concat(group_list[2], concatenation_name)
         ll_ds = ll_ds.rename_dims({'log_lhood_dim_0': sample_name})
         group_dict["log_likelihood"] = ll_ds
     if posterior_predictive is not None:
         pp_ds = xr.concat(group_list[3], concatenation_name)
         pp_ds = pp_ds.rename_dims({'y_predict_dim_0': sample_name})
         group_dict["posterior_predictive"] = pp_ds
     ...
     group_ds = group_dict[group].assign_coords(  
            {concatenation_name: coords[concatenation_name],
             sample_name : coords[sample_name]}
      )

gibsramen commented 3 years ago

If I'm understanding your request correctly, this should already be possible by specifying the coords/dims of LL/PPC in to_inference_object.

See the following example from the custom LME page.

inference = nb_lme.to_inference_object(
    params=["beta", "phi", "subj_int"],
    dims={
        "beta": ["covariate", "feature"],
        "phi": ["feature"],
        "subj_int": ["subject"],
        "log_lik": ["tbl_sample", "feature"],
        "y_predict": ["tbl_sample", "feature"]
    },
    coords={
        "feature": nb_lme.feature_names,
        "covariate": nb_lme.colnames,
        "subject": groups,
        "tbl_sample": nb_lme.sample_names
    },
    alr_params=["beta"],
    posterior_predictive="y_predict",
    log_likelihood="log_lik",
    include_observed_data=True
)

Let me know if I've misunderstood.

mortonjt commented 3 years ago

got it ... ok I don't think this is a problem, looks like I wasn't specifying it correctly.

biocore / BIRDMAn

posterior_predictive and log_likelihood could have sample names #39