LieberInstitute / spatialLIBD

Code for the spatialLIBD R/Bioconductor package and shiny app
http://LieberInstitute.github.io/spatialLIBD/
78 stars 16 forks source link

[Feature Request] registration_wrapper(): create and check validity of eventual pseudobulk factor names before starting #48

Open berniejmulvey opened 1 year ago

berniejmulvey commented 1 year ago

I recently ran registration_wrapper() on a very high-dimension dataset (~80 or ~200 clusters across 47 samples, depending on the granularity) and twice had runs go >8 hours before seeing warnings about invalid factors in the limma steps. For example, the first time around, the cluster names were character strings but with leading digits (e.g., "038_Nonneuron_OPC"). There was a single cluster of cells that had been annotated with a "/" in the name, which I didn't even consider the factor invalidity of.

Since the duplicateCorrelation step for datasets of extra high complexity can take several hours, it would be more resource efficient to have a pre-run check that stops the wrapper from running if invalid factor names are going to be generated down the line, rather than wait for limma to encounter them.

lcolladotor commented 11 months ago

Can you provide more details about the error you got and maybe make a small reproducible example with a fake dataset?

See https://youtu.be/8bBo3B7N8YQ and https://reprex.tidyverse.org/ for more details. Though the basic idea is to write code that you can then run with reprex::reprex() that shows the problem you encountered. Just make a tiny example with random data but with the right values in the colData() that led to the error you are describing.

Thanks!

berniejmulvey commented 11 months ago

Ah, so more specifically, it's whitespace characters that are not picked up by the initial check run by SpatialLIBD (second spatial_registration call in the attached code). Here's an HTML output from reprex. spatiallibd-cell-type-label-error-reprex.html.zip

lcolladotor commented 5 months ago

I'm delegating this one to @lahuuki now that she's back analyzing spatial data