Improve Wilms Tumor Dataset Annotation (SCPCP000006) - explore `predicted.score` and `has_cnv.score` thresholds

maud-p commented 14 hours ago

If you are filing this issue based on a specific GitHub Discussion, please link to the relevant Discussion.

This issue follows the PR https://github.com/AlexsLemonade/OpenScPCA-analysis/pull/844 and the 2 comments:

Describe the goals of the changes to the analysis module.

I would like to explore difefrent thresholds for filtering and annotating based on the predicted.score and cnv.score. I would like to:

[ ] improve the umpa reduction visualization with a 2-colors plot showing only one annotation and the rest in grey.
[ ] look at the distribution of predicted.score for each of the predicted.compartment and predicted.cell_type. So far, we only used the predicted.score to select normal cells (i.e. endothelial and immune cells), but don't use it to filter out cells with very low confident annotation (label as unknown).
[ ] render few notebook with a cnv_threshold of 0, 1 or 2 and evaluate the identification of normal cells. I'd like to check the distribution of the predicted.score of endothelial, immune, normal kidney and normal stroma cells using each of the threshold. It can be that, due to false positive cnv, normal cells showed some infered cnv. If this is the case, we should expect to recover more normal cells with high predicted.score using higher cnv_threshold.

What will your pull request contain?

Few changes in the 07 notebook

Will you require additional software beyond what is already in the analysis module?

No response

Will you require different computational resources beyond what the analysis module already uses?

No response

If known, when do you expect to file the pull request?

~ November

sjspielman commented 14 hours ago

Hi @maud-p, glad to see you back here in issues! I wanted to give you a heads up about continuing this module - I am still working behind the scenes on your module to get it all running in CI. I have updated the label transfer code but it's not yet merged into main (but will be within the next 2 weeks I think 🤞), since I am still working in a separate branch to fix some bugs we are now able to find with all code running in CI. You can see code as we work on it in this branch: https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/feature/wilms-tumor-06-azimuth. While I am still working in my fork, rather than sending PRs to main, I am sending them here. Once this is entirely finished, we'll merge that branch into main.

FYI - one silly (!!!) bug I found is that somehow we never actually applied the score threshold in inferCNV - woops!! So as part of this, I am making sure we use the threshold in that script too!

I think that working on the module while I am still doing this will result in _a lot_of conflicts which will be very challenging to resolve. Also, the results will slightly change because of the new label transfer code, and the actual use of the 0.85 threshold in inferCNV, which will also complicate interpretation and validation. Are you able to wait a few weeks before doing these additional analyses? I will certainly keep you updated as I continue this process!

maud-p commented 14 hours ago

@sjspielman thank you for all your efforts in making the analysis run in CI! I understand and I can wait, no problem at all! No rush from my side. I just opened the issues to inform you about the plans and coordinate with you the next steps. Just let me know if/how I can help and when I can start working on the analysis again 😃 Thank you !

AlexsLemonade / OpenScPCA-analysis