Open kieranrcampbell opened 5 years ago
Chipping in here, as I'm seeing some similarities with some anti-patterns I've observed in scater.
The only legitimate subsetting approach is 1. Though 2 and 3 might seem convenient, they make it much more complicated for people to guarantee the right genes were being used. (Is the matching done based on the row names? Or did it end up using a field in rowData
? In which case, which field?)
If people want to use IDs in one of the rowData
fields, all they have to do is:
match(my_ids, rowData(sce)$SYMBOL)
... and supply that to a subset argument in cellassign()
(for examples, see some of the refactored scater functionality for subset_rows
). This is much more explicit and makes the intent of the code clearer.
You will probably want to protect against NA
elements in the subsetting vector, though.
Thanks for the input @LTLA, we'll go for this option then
No probs. Plenty more ~opinions~ objective rules where that came from.
Casual user may want to pass in an entire SCE but just the rho matrix corresponding to their marker genes. In theory we should be able to detect this and appropriatley subset the SCE by matching the
colnames
of rho with