Closed jashapiro closed 1 year ago
One thing I don't do at the moment is really check which version of the workflow/nextflow was used at each stage, so everything at the moment gets the . We don't really have that in a file that can be extracted within the checkpoint directory itself, but I could look at the
publish
directory and pull it from the output json file. This is probably worth doing, and will be the next thing I work on.
Update: this is now present for the scRNAseq data. It turns out the only thing we seem to have in the output is the workflow version (not nextflow), so that is all I am grabbing.
This script is now done, tested and run!
This PR, currently in draft form, is designed to complete https://github.com/AlexsLemonade/ScPCA-admin/issues/408. To do so, the script here performs two tasks: moving checkpoint files from an old to new location (as we changed from
internal
tocheckpoints
directories default) and addingscpca-meta.json
files to the checkpoint directories as needed.In the current form, it processes scRNAseq samples (rad files) and vireo results (which already have scpca-meta.json files in all versions).
The basic idea is to generate the checkpoint directories in the same way scpca-nf (as of the future version 0.4, as I am calling it), then copy file contents from the previous location to the current one. I used the
aws
command line for this part, asboto3
doesn't have a built-in recursive copy or sync (and I didn't trust myself to write one), though I use boto3 for more atomic operations like checking if files exist and writing the json file.For files that require the
scpca-meta.json
file, we generate all of the fields that would be created by the workflow, then write that file to the appropriate checkpoint directory.Before doing either, it checks if there are files in the new location, and will not overwrite unless explicitly told to do so by an option.
Speaking of options, there are many, because there are a lot of things that are relatively constant, but could potentially change. I ended up putting a lot of these into options partly just so I could pass them around as part of the
args
dictionary.One thing I don't do at the moment is really check which version of the workflow/nextflow was used at each stage, so everything at the moment gets the . We don't really have that in a file that can be extracted within the checkpoint directory itself, but I could look at the
publish
directory and pull it from the output json file. This is probably worth doing, and will be the next thing I work on.After that, I will return to the bulk and spatial functions, which exist now mostly as stubs.
Note also that the current
prefix
argument sends us to our sort of working directory. When run for real, this will be changed toscpca-prod
.Please do let me know how you think this looks as a general direction, and if you see any other issues or places where the function of the code is unclear.