Closed sjspielman closed 10 months ago
I'll also note - the other option here, rather than adding this argument, is to change how we handle retain_coldata_cols
. Rather than creating & filling in missing columns with NA
, we could just ignore them. If we update with* that approach, we can then make this parallel change in scpca-nf
as described here: https://github.com/AlexsLemonade/scpcaTools/issues/258#issuecomment-1895922396
edit: Or yet another approach! We could be less prescriptive with the name for celltype_coldata_cols
but keep the argument as implemented. And then, pass in cell type column names in scpca-nf
, again as linked ^
In our current implementation, we pass in a vector retain_coldata_cols to keep any colData columns we want kept. But, we also create these columns if they are missing and populate them with NAs. Therefore, one does not simply add the cell type columns into this vector. If there are no cell type columns, we do not want to end up creating them!
I still don't think we need to change anything in the function. I would change the vector that actually gets passed in to the function in scpca-nf
to only include the cell type columns that are present. Otherwise you don't include them and they won't get created. That makes it so you don't have anything to do here.
I would change the vector that actually gets passed in to the function in scpca-nf to only include the cell type columns that are present. Otherwise you don't include them and they won't get created. That makes it so you don't have anything to do here.
I still don't think this is true. For example, we might have the situation where some but not all of the libraries in a given project have cell types. If that were to happen, we'd end up with NA
cell types. This has most potential to happen I think for submitters.
In our current implementation, we pass in a vector retain_coldata_cols to keep any colData columns we want kept. But, we also create these columns if they are missing and populate them with NAs. Therefore, one does not simply add the cell type columns into this vector. If there are no cell type columns, we do not want to end up creating them!
I still don't think we need to change anything in the function. I would change the vector that actually gets passed in to the function in
scpca-nf
to only include the cell type columns that are present. Otherwise you don't include them and they won't get created. That makes it so you don't have anything to do here.
I agree with this. In general we should make changes to that add arguments (and complexity) with great caution.
I still don't think this is true. For example, we might have the situation where some but not all of the libraries in a given project have cell types. If that were to happen, we'd end up with NA cell types. This has most potential to happen I think for submitters.
I think this is okay though. If they exist for any libraries, we want that column to be present and then NA for other libraries.
I would change the vector that actually gets passed in to the function in scpca-nf to only include the cell type columns that are present. Otherwise you don't include them and they won't get created. That makes it so you don't have anything to do here.
I still don't think this is true. For example, we might have the situation where some but not all of the libraries in a given project have cell types. If that were to happen, we'd end up with
NA
cell types. This has most potential to happen I think for submitters.
If some libraries have cell types and others do not, don't we want NA
?
This would only affect the merged object, right?
If they exist for any libraries, we want that column to be present and then NA for other libraries.
I don't actually know... might we want them to be "Submitter excluded"
?
This would only affect the merged object, right?
But yes, it would only affect the merged object!
Let's regroup over meet shortly!
I don't actually know... might we want them to be "Submitter excluded"?
If that's the case, then we should still make that change in scpca-nf
.
Well, it was fun (for me!) while it lasted! Closing this as not happening in this repo.
Closes #258
This PR updates SCE merging to ensure that any cell type annotation columns are retained in the
colData
.In our current implementation, we pass in a vector
retain_coldata_cols
to keep any colData columns we want kept. But, we also create these columns if they are missing and populate them withNA
s. Therefore, one does not simply add the cell type columns into this vector. If there are no cell type columns, we do not want to end up creating them!So, I created a new argument in
merge_sce_list
celltype_coldata_cols
with a default value of all the possible cell type columns that one might encounter in thecolData
slot. This argument gets used in the the subfunctionprepare_merge_sce
, which has a default ofNULL
for this argument. Then:NULL
, just ignore itretain_coldata_cols
withNA
_, tack on the celltype columns which actually exist intoretain_coldata_cols
. Use this full vector to subsetcolData
.I updated some docs along the way, and I added some tests ✅
A handful of recreational revisions in the tests as a 🍬 too - stop using
sort()
to compare vectors and start usingtest_setequal
.