d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

`histologies-base.tsv` missing DGD `sample_id`, possibly all normals, other info #323

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

What data file(s) does this issue pertain to?

2022-05-20 download of histologies-base.tsv is missing sample_id for all DGD samples

What release are you using?

NA

Put your question or report your issue here.

With the latest pull,

  1. All DGD samples are missing sample_id
  2. All 699 samples are Tumor, but only 697 have experimental_strategy == Targeted Sequencing (BS_DKZ8JFR8 and BS_MADCWWMX have NA experimental_strategy).
  3. All samples are Tumor - are we missing Normals from the view?
  4. When re-mapping pathology_diagnosis, please move old pathology_diagnosis to pathology_free_text_diagnosis for record-keeping.
nicholasvk commented 2 years ago

@jharenza - I just checked this in the warehouse and not seeing issues 1-3 reflected in the productions tables for the report currently in the warehouse. Can you please confirm the version and re-pull if needed?

jharenza commented 2 years ago

Let me do a re-pull @nicholasvk. Is there a way to get a version, or would it just be a timestamp?

jharenza commented 2 years ago
  1. All DGD samples are missing sample_id

Ok, this was my import into R issue - needed to increase my guess_max()!

  1. All 699 samples are Tumor, but only 697 have experimental_strategy == Targeted Sequencing (BS_DKZ8JFR8 and BS_MADCWWMX have NA experimental_strategy).

This issue still exists. Will also loop in @yuankunzhu and @zhangb1 here about what type of library prep was performed so we can annotate an RNA_library to distinguish RNA and DNA samples.

  1. All samples are Tumor - are we missing Normals from the view?

We still do not have DGD normals in view. @yuankunzhu were normals used for the analyses?

yuankunzhu commented 2 years ago

I don't think DGD do sequencing for normals?

jharenza commented 2 years ago

@nicholasvk it looks like BS_DKZ8JFR8 and BS_MADCWWMX, which have NA experimental_strategy, are not in the genomics file, so I suppose we can remove those on our end.

So the only remaining to do for this ticket will be 4.

nicholasvk commented 2 years ago

Confirming that we discussed these two samples and they are for "Real time" workflow samples sequenced with Tempus that we were asked to register under the DGD study. Komal requested that these then be added into the histologies file for analysis.