Closed allyhawkins closed 8 months ago
The structure here seems good, but I think we want to be a bit more strict about what files names we change, so that we don't accidentally run this in the future and update things we don't want to update. Specifically, we should check that the file name is of the "old" format before updating, so we don't accidentally update version numbers in the future when we don't really want to.
This is a great point! I went ahead and added a check for a version string. If that's present in either of the filenames that are in scpca-meta.json
, then no updates are made. I also updated the description of the script at the top of the file to mention that.
Related to https://github.com/AlexsLemonade/ScPCA-admin/issues/691
We recently updated the reference file names for both
SingleR
andCellAssign
. Before doing this, we had previously run a few samples through bothSingleR
andCellAssign
. To skip cell typing, we check that the reference file name stored inlibrary_id_cellassign/scpca-meta.json
andlibrary_id_singler/scpca-meta.json
match the reference file names that have been passed through the workflow via the project metadata. If we want to run the projects through again and skip runningCellAssign
for samples that already haveCellAssign
results, then these reference files need to be updated in thescpca-meta.json
files.Here I'm adding a script that specifically updates the cell type
scpca-meta.json
files to make sure that the reference file names match what's inscpca-project-celltype-metadata.tsv
. It's mostly modeled after the script we use for updating the mapping relatedscpca-meta.json
files, but here we need to update two checkpoint files per library, one for each method.Also we want to account for values that may already be there, but with a different file name. So we directly compare what's in the checkpoint file vs. what's in the metadata file and update accordingly. Additionally, if
NA
is in the project metadata, then we don't set the path and fill with NA. Although this shouldn't really affect any of the files that will get updated here. I removed all oldSingleR
files, so in reality only theCellAssign
results are getting updated, and they only exist if there was a reference file in the first place.I've tested this with runs in
scpca/processed
. Once this gets approved I'll run forscpca-prod
.