AlexsLemonade / scpca-nf

scpca-nf is the Nextflow workflow for processing Single-cell Pediatric Cancer Atlas Portal data
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Update description of tissue type and organs for CellAssign references in supplemental report #589

Closed allyhawkins closed 10 months ago

allyhawkins commented 11 months ago

In #587, we are making some modifications to references that mean multiple tissues/ organs will be found in a single reference for CellAssign. I think it would be useful to be able to include the list of organs found in the reference in the actual supplemental cell type report. This would mean pulling in the organs from the reference metadata file some way, or maybe we could just link to that file in the repo so users can see which organs are found in that reference. I don't think this needs to be done before the release, I just wanted to note that we may want to address this in the future. Right now, we just print out the reference name, e.g., blood-compartment.

sjspielman commented 10 months ago

As discussed in DSTM, we'll also need to track organs, so we'll need a separate step here of also reading in the reference file which contains the organs. We'll want to read this file in and add organs into the metadata so that we can incorporate them into the report here.

Noting this file is already in config since it's used for the reference building workflow - https://github.com/AlexsLemonade/scpca-nf/blob/30c7822e205c43aedb5484ab1e40589b68b9bc6e/config/reference_paths.config#L26

sjspielman commented 10 months ago

Reminding myself about our cell typing code and thinking about workflow updates to be able to update the report -

We don't actually pass around the reference names directly in nextflow, only the reference filenames. In add_celltypes_to_sce.R, we extract the reference name from the reference filename to include in SCE metadata. So, if we were to want to read in the celltype reference file (with organs) into the workflow as a map, we'd have to add in some nextflow string manipulation to join in the organs with our existing channels, and I think we might want to avoid that because it introduces redundancy - whatever string things we would have to add into nextflow are already being done by R.

So, are there any particular objections to just reading in the celltype reference file (with organs) directly into add_celltypes_to_sce.R and bypassing all nextflow manipulation? We'd have to add an argument to this script with the file path, but nextflow code won't really be changed except for a file exists check. @allyhawkins @jashapiro

jashapiro commented 10 months ago

I don't think I see a problem with that method. You will need to pass the file itself into the workflow process as well to get it copied to the execution environment, but that should be about it.

allyhawkins commented 10 months ago

That plan sounds fine to me.

sjspielman commented 10 months ago

Closed by #608 & #609