Clinical-Genomics / cg

Glue between Clinical Genomics apps
6 stars 2 forks source link

Subject_ID not enough to distinguish samples #1600

Open peterpru opened 1 year ago

peterpru commented 1 year ago

Description

Currently subject_id represents a patient. However, now we start to receive cases where we need more than this to be able to upload RNA to DNA samples. Below is an example of a patient where we have on the bottom the non-tumour DNA sample, and then there are two tumour samples and two tumour RNA samples.

It is common that there are e.g. two RNA samples being uploaded to two DNA samples that share the same subject_id, but these are usually distinguished by being a tumour/non-tumour pair, where the upload command will upload the RNA tumour to the DNA tumour, and the RNA non-tumour to the DNA non-tumour.

Now this fails for the current case, where there are two tumour RNA samples and two tumour DNA samples (different tissues), and the upload command will not be able to know which of these two are connected. The customer attempted to rename the different tissues by adding a -1 and -2 to it. However, this means that in this case subject_id no longer represents a patient.

image

Suggested solution

One suggestion would be to have subject_id just represent a link between RNA and DNA samples, just as the customer attempted here, having -1 and -2 appended to the original subject ID for different tissues. Option two would be to have another field under RNA samples where the DNA sample name is mentioned, something called 'linked_DNA_sample' or similar, which could then be used to identify the correct DNA sample to upload the RNA scout data to. However, if we want to upload RNA to collaborators (e.g. shared samples bestween customers), we would want/need the subject_id, and not the DNA sample name.

This can be closed when

This can be closed when upload succeeds for all samples for a given patient.

Blocked by

If there are any blocking issues/prs/things in this or other repos. Please link to them.

henrikstranneheim commented 1 year ago

I think we need to extend the subject_id data model, I made a suggestion in WIP PR here: https://github.com/Clinical-Genomics/cg/pull/1472/files#diff-fd54df7c4d95028078c21bacdfc285c680135465f92ca21464a593f7a0928c90

Basically, I think we should use a biopsy_id to link biopsies together with the subject_id.

henrikstranneheim commented 1 year ago

Shuold be part of a future refactoring of StatusDB in a project