Closed allyhawkins closed 9 months ago
I updated this to use the dictionary that you proposed. I also took out the submitter cell types file check and have that separate.
Namely: The assay_ontology_term_id and technology columns are presumably relatable by a simple dict lookup. I think it makes sense to keep both in the metadata file, but we ought to be able to handle those lookups in the case that the ontology value in the table is NA. I'm not sure if this is the right place for it, or if it should be in the workflow itself? Or (and this is probably the correct answer) it should be in the metadata check scripts: Since we can check that the two fields agree, we probably should.
I definitely like the idea of having a check in the workflow itself for NA terms for assay ontology and tech version! We could do that with the other library fields too.
I'm not sure what you mean about having it in the metadata checks? We don't have checks between scpca-library-metadata.json
and scpca-meta.json
for each library?
I'm not sure what you mean about having it in the metadata checks? We don't have checks between
scpca-library-metadata.json
andscpca-meta.json
for each library?
I meant as a step in check_metadata.py
I meant as a step in check_metadata.py
Right, but what exactly are you checking? That script checks matches between the sample and the library metadata, but the assay_ontology_term_id
is only in the library metadata?
I meant as a step in check_metadata.py
Right, but what exactly are you checking? That script checks matches between the sample and the library metadata, but the
assay_ontology_term_id
is only in the library metadata?
I was thinking to make sure that we had a correct match between the technology
and assay_ontology_term_id
as a check for data entry errors.
I ran this on all the libraries so all checkpoints files should now be up to date for future processing.
Closes #175
Since we have now added two new terms to the initial
scpca-meta.json
inscpca-nf
, we need to update the existing checkpoints files to include those additions before we re-process for AnnData or cell type additions.add-refs-scpca-meta.py
to be namedadd-fields-scpca-meta.py
. This script can now be used to add any new fields we may need to as we continue development ofscpca-nf
. The readme and documentation for the script usage has now been updated to reflect that.I tested this with a library in
scpca/processed
and things worked as expected. Once this is approved, I'll run this for the entirescpca-prod/checkpoints
folder.