AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Update script for adding new fields to `scpca-meta.json` to include assay ontology and submitter cell types #176

Closed allyhawkins closed 9 months ago

allyhawkins commented 10 months ago

Closes #175

Since we have now added two new terms to the initial scpca-meta.json in scpca-nf, we need to update the existing checkpoints files to include those additions before we re-process for AnnData or cell type additions.

I tested this with a library in scpca/processed and things worked as expected. Once this is approved, I'll run this for the entire scpca-prod/checkpoints folder.

allyhawkins commented 10 months ago

I updated this to use the dictionary that you proposed. I also took out the submitter cell types file check and have that separate.

Namely: The assay_ontology_term_id and technology columns are presumably relatable by a simple dict lookup. I think it makes sense to keep both in the metadata file, but we ought to be able to handle those lookups in the case that the ontology value in the table is NA. I'm not sure if this is the right place for it, or if it should be in the workflow itself? Or (and this is probably the correct answer) it should be in the metadata check scripts: Since we can check that the two fields agree, we probably should.

I definitely like the idea of having a check in the workflow itself for NA terms for assay ontology and tech version! We could do that with the other library fields too. I'm not sure what you mean about having it in the metadata checks? We don't have checks between scpca-library-metadata.json and scpca-meta.json for each library?

jashapiro commented 10 months ago

I'm not sure what you mean about having it in the metadata checks? We don't have checks between scpca-library-metadata.json and scpca-meta.json for each library?

I meant as a step in check_metadata.py

allyhawkins commented 10 months ago

I meant as a step in check_metadata.py

Right, but what exactly are you checking? That script checks matches between the sample and the library metadata, but the assay_ontology_term_id is only in the library metadata?

jashapiro commented 10 months ago

I meant as a step in check_metadata.py

Right, but what exactly are you checking? That script checks matches between the sample and the library metadata, but the assay_ontology_term_id is only in the library metadata?

I was thinking to make sure that we had a correct match between the technology and assay_ontology_term_id as a check for data entry errors.

allyhawkins commented 10 months ago

I ran this on all the libraries so all checkpoints files should now be up to date for future processing.