Open berniejmulvey opened 1 year ago
Can you provide more details about the error you got and maybe make a small reproducible example with a fake dataset?
See https://youtu.be/8bBo3B7N8YQ and https://reprex.tidyverse.org/ for more details. Though the basic idea is to write code that you can then run with reprex::reprex()
that shows the problem you encountered. Just make a tiny example with random data but with the right values in the colData()
that led to the error you are describing.
Thanks!
Ah, so more specifically, it's whitespace characters that are not picked up by the initial check run by SpatialLIBD (second spatial_registration call in the attached code). Here's an HTML output from reprex. spatiallibd-cell-type-label-error-reprex.html.zip
I'm delegating this one to @lahuuki now that she's back analyzing spatial data
I recently ran registration_wrapper() on a very high-dimension dataset (~80 or ~200 clusters across 47 samples, depending on the granularity) and twice had runs go >8 hours before seeing warnings about invalid factors in the limma steps. For example, the first time around, the cluster names were character strings but with leading digits (e.g., "038_Nonneuron_OPC"). There was a single cluster of cells that had been annotated with a "/" in the name, which I didn't even consider the factor invalidity of.
Since the duplicateCorrelation step for datasets of extra high complexity can take several hours, it would be more resource efficient to have a pre-run check that stops the wrapper from running if invalid factor names are going to be generated down the line, rather than wait for limma to encounter them.