Closed jharenza closed 10 months ago
@jharenza This week I'm working on wrapping up the clinical report CWL. Probably I'll start working on this next week.
Thanks!
@jharenza Some points want to confirm with you,
You just want to run add_v11_updates.py
with the files in the input
folder to generate the histology file, right? Other scripts(01-samples_to_add.R/02-path_dx_mapping.R...
) will not be run.
Briefly, histologies-base-adapt.tsv
. ---python add_v11_updates.py
---> histologies-base.tsv
Which one does histologies-base-adapt.tsv
refer to? There are prod_reporting.openpedcan_histologies
and prod_reporting.pbta_histologies
in DW. Or refer to another one?
The frequency to do this QC? Once? daily? weekly? monthly?
Where do you want to put the histologies-base.tsv
file? D3b warehouse or s3 bucket? The github repo is not a good place.
@jharenza Some points want to confirm with you,
- You just want to run
add_v11_updates.py
with the files in theinput
folder to generate the histology file, right? Other scripts(01-samples_to_add.R/02-path_dx_mapping.R...
) will not be run. Briefly,histologies-base-adapt.tsv
. ---python add_v11_updates.py
--->histologies-base.tsv
Actually, no. We will only want to run the scripts in the shell script. The python code was used to generate histologies files for TCGA, TARGET, GTEX, GMKF NBL, which were uploaded to the Data Tracker and then pulled into the WH, so they do not need to be regenerated using this code again.
- Which one does
histologies-base-adapt.tsv
refer to? There areprod_reporting.openpedcan_histologies
andprod_reporting.pbta_histologies
in DW. Or refer to another one?
prod_reporting.openpedcan_histologies
- The frequency to do this QC? Once? daily? weekly? monthly? There are two steps to this QC, not immediately clear. I am pasting below:
histologies-file-generation.pdf
We should create a weekly QC which runs the histologies-base-adapt.tsv
against itself each week.
We will want to create a release-based QC (theoretically we can probably do this monthly for now) generating histologies-base.tsv
and comparing this to histologies.tsv
of the previous release, in this case OpenPedCan v11. Once v12 is released, that release would change to v12 for comparison.
- Where do you want to put the
histologies-base.tsv
file? D3b warehouse or s3 bucket? The github repo is not a good place.
I think until we have a process, maybe S3 for now. Can we create a specific histologies folder in 538745987955 (since this one is not a public bucket)?
@jharenza I made the PR:https://github.com/d3b-center/histologies-qc/pull/1, please review it if you get a chance.
This is now further automated in the histologies-qc repo
What data file(s) does this issue pertain to?
What release are you using?
v11
Put your question or report your issue here.
@HuangXiaoyan0106 - can you create a workflow based on this QC code to automate generation of
histologies-base.tsv
fromhistologies-base-adapt.tsv
(the histologies file from the D3b warehouse?If possible, can we use this repo and not use the folder structure as in
d3b-codes
?Please let me know if you have any questions.
cc @aadamk