cytomining / profiling-handbook

Image-based Profiling Handbook
https://cytomining.github.io/profiling-handbook/
Creative Commons Zero v1.0 Universal
8 stars 7 forks source link

Update cytominer-database package #49

Closed shntnu closed 2 years ago

shntnu commented 4 years ago

The VM mentioned in this manual currently has an older version of cytominer-database. Install the latest version, primarily to handle nans correctly.

pip install --upgrade cytominer-database

See this PR and comment https://github.com/cytomining/cytominer-database/pull/104#issuecomment-511440383

cc @NasimJ @DavidStirling @bethac07

shntnu commented 4 years ago

The older version will save a nan as a string (nan) whereas the latest version will save it as NULL, which is the desired behavior.

shntnu commented 4 years ago

Fixture created using cytominer-database 0.3.3: https://imaging-platform.s3.us-east-1.amazonaws.com/projects/2018_06_05_cmQTL/workspace/backend/2020_03_05_Batch6/cmQTLplate7-2-27-20/cmQTLplate7-2-27-20.sqlite

We know that Cells_Neighbors_AngleBetweenNeighbors_10 has some NAs

Export that column

sqlite3 ~/ebs_tmp/2018_06_05_cmQTL/workspace/backend/2020_03_05_Batch6/cmQTLplate7-2-27-20/cmQTLplate7-2-27-20.sqlite
.headers on
.mode csv
.output cmQTLplate7-2-27-20_Cells_Neighbors_AngleBetweenNeighbors_10.csv
select Cells_Neighbors_AngleBetweenNeighbors_10 from Cells;
.exit

Check how many empty lines

grep -v "\\." cmQTLplate7-2-27-20_Cells_Neighbors_AngleBetweenNeighbors_10.csv |grep -v Cells_Neighbors_AngleBetweenNeighbors_10|wc -l
> 198

Check if any nans

grep nan cmQTLplate7-2-27-20_Cells_Neighbors_AngleBetweenNeighbors_10.csv | wc -l
> 0
library(magrittr)
sqlite_file <- "~/ebs_tmp/2018_06_05_cmQTL/workspace/backend/2020_03_05_Batch6/cmQTLplate7-2-27-20/cmQTLplate7-2-27-20.sqlite"
db <- dplyr::src_sqlite(path = sqlite_file)
cells <- dplyr::tbl(src = db, "cells")
feature <- cells %>% dplyr::select(Cells_Neighbors_AngleBetweenNeighbors_10) %>% dplyr::collect()
sum(is.na(feature))

[1] 198

So everything lines up here