Closed gwaybio closed 6 months ago
The number of images we report in the manuscript and in LOIO is consistent (270)
Oooh, is it that the cell_UUIDs are different in IC vs. no-IC?
If so, is there a way to align cell_UUIDs?
Oooh, is it that the cell_UUIDs are different in IC vs. no-IC?
Yep, this is definitely the case.
If so, is there a way to align cell_UUIDs?
We could try to match cells from the IC dataset to the corresponding closest cells in the no-IC dataset by location (plate, well, frame, center_x, center_y). The only issue is that we have a different number of cells in these datasets so some cells are not represented in both datasets and thus would have no match.
Unfortunately I completely missed https://github.com/WayScience/phenotypic_profiling_model/blob/main/1.split_data/explore_data.ipynb in #49. @gwaybio would it be easy for you to change this notebook to use the new datasets? If not I could try to do this in the near future as well.
It seems you are saving the correlation files from these notebooks in this repo but not pushing to GitHub:
# Output to file
output_file = f"{output_basename}_{feature_space}.tsv.gz"
cp_tidy_corr_df.to_csv(output_file, sep="\t", index=False)
We could try to match cells from the IC dataset to the corresponding closest cells in the no-IC dataset by location (plate, well, frame, center_x, center_y). The only issue is that we have a different number of cells in these datasets so some cells are not be represented in both datasets and thus would have no match.
Sounds good - we can revisit this decision in the future if needed.
would it be easy for you to change this notebook to use the new datasets?
I'm not sure what you mean - do you mean we would need to apply the non-ic dataset in this analysis as well?
Sounds good - we can revisit this decision in the future if needed.
Awesome, I'll use this methodology to associate cells across the IC and no-IC datasets.
I'm not sure what you mean - do you mean we would need to apply the non-ic dataset in this analysis as well?
This depends on if we want to perform this analysis (notebook is for pairwise correlations between single-cells) on the no-ic dataset as well (I think this is your final call). If we just want to perform this analysis for ic data, we can simply change the labeled_data_path
to pathlib.Path("../0.download_data/data/labeled_data__ic.csv.gz")
.
I am having trouble understanding how we use output from this notebook (pairwise correlations between single-cells). It seems that you saved the output tsvs to 1.split_data/data
but these files did not get uploaded to GitHub. Maybe this is a deprecated analysis that can be deleted?
If we want this analysis on ic and no-ic datasets I can modify the notebook to iterate over both datasets, but I am unsure of how to save the output tsv files. I assume they would not belong on the GitHub as you did not push them to the repo before.
Let me know the objective of this analysis/notebook and I can modify the notebook to accomplish this.
This depends on if we want to perform this analysis (notebook is for pairwise correlations between single-cells) on the no-ic dataset as well (I think this is your final call).
Gotcha! We do not need to do this. My view is that we use the IC model for everything except to confirm that IC is not impacting LOIO performance.
I am having trouble understanding how we use output from this notebook (pairwise correlations between single-cells). It seems that you saved the output tsvs to 1.split_data/data but these files did not get uploaded to GitHub. Maybe this is a deprecated analysis that can be deleted?
Ah, good questions! This analysis is important, and documentation can be improved. See cell 5 in https://github.com/WayScience/phenotypic_profiling_model/blob/main/7.figures/Figure2_UMAP_and_Correlation.ipynb
If we want this analysis on ic and no-ic datasets I can modify the notebook to iterate over both datasets, but I am unsure of how to save the output tsv files. I assume they would not belong on the GitHub as you did not push them to the repo before.
My instincts are that we don't need to align them. Supplementary Figure 6 shows only minimal impact between ic and no-ic. Do you expect that aligning single cells will show a different result?
Let me know the objective of this analysis/notebook and I can modify the notebook to accomplish this.
The objective is to determine how IC impacts LOIO results. Based on the previous analysis, we are able to make the following statement:
Poor LOIO performance was not a result of illumination correction, which we hypothesized could have introduced technical effects given our batched IDR_stream image processing, nor by our decision to balance models by uneven class distributions (Supplementary Figure 6B).
If matching single cells will give us a better answer and the analysis won't be too difficult, then I'd say go for it.
Gotcha! We do not need to do this. My view is that we use the IC model for everything except to confirm that IC is not impacting LOIO performance.
Sounds good! In this case I will file I small PR to simply change the labeled_data_path
to pathlib.Path("../0.download_data/data/labeled_data__ic.csv.gz")
.
My instincts are that we don't need to align them. Supplementary Figure 6 shows only minimal impact between ic and no-ic. Do you expect that aligning single cells will show a different result?
Nope, I wouldn't expect aligning single cells to show a different result, and I would expect it to be difficult to refactor the repository for this adjustment. Let's leave the cells unaligned 👍
@roshankern - is it safe to close this issue?
Thank you for the ping on this! @gwaybio are you able to review and/or merge #65? Then we can close this issue.
I see three different labeled cell counts, and I would like to confirm the correct total.
Maybe there is something wonky going on with LOIO?
@roshankern do you know why we're seeing these cell count discrepancies?