Closed gwaybio closed 3 years ago
We find only 1,620 consensus profiles in batch 2 (we have 8,340 in batch 1).
This must be a metadata issue or a missing grouping column
Batch 2 has 3 cell lines x 3 dose points x 3 time points x 360 compounds = ~9720 (not exact because some compounds might be missing all doses)
Here is the exact number of consensus profiles for batch 2
n_consensus |
---|
9396 |
missing time as a grouping column, thanks!
this turned out to be an even larger problem. the aggregate function will drop samples if one of their aggregating columns (strata
) has missing values. eek! I opened cytomining/pycytominer#133 to resolve this globally, but for this PR, my solution is to recode missing values as "unknown". This only impacts the MOA and target columns.
This impacted both batches of data, but batch 2 substantially more. Batch 2 now has 10,368 consensus profiles. Note that your example above does not include platemaps from multiple time points.
Also note that I do update MOAs in the profiling step for both batches:
But i wonder if I need to update the external moa file first with the new batch broad ids...
in other words, if I have to do this, then I'll need to rerun the profiling pipeline again for at least batch 2 data
@shntnu - this PR is ready for review. Let's discuss a potential full reprocessing in #62. We need not decide to reprocess in full before merging this PR.
the aggregate function will drop samples if one of their aggregating columns (strata) has missing values.
Wow, glad you found it! Bad 🐼 !
This impacted both batches of data, but batch 2 substantially more. Batch 2 now has 10,368 consensus profiles. Ah that's because we are using
pert_well
in grouping (as per plan 👍 ). 3 x 3 x 3 x 384 = 10,368Note that your example above does not include platemaps from multiple time points.
For our notes: It does actually – there are only 3 unique platemaps (containing 3 doses x ~360 compounds), so I read 3 of them then multiplied that by 3x3. But that example is useless given that we are computing consensus by including the pert_well
column :D
this PR is ready for review.
lgtm
Here, I add consensus profiles for batch 2 profiles. I also add
Metadata_cell_id
to the aggregation columns for both batches (batch 2 has three cell lines). I make some minor changes throughout the notebook.We find only 1,620 consensus profiles in batch 2 (we have 8,340 in batch 1).