Closed idazucchi closed 1 year ago
LGTM - Just a bio question... do we have any guidelines on what adjacency is? like, let's say we find a paper that states the tissue was taken 5 cm away from the affected tissue... is that adjacent? have you found/heard from the bionetworks anything about that?
Yes, how would you represent the disease the specimen was adjacent to? If the donor had multiple diseases how would you encode what the adjacent disease was?
Looks like to me that we may want to:
Otherwise, even if you use donorOrganism.KnownDiseases to record the "adjacent to" disease, you can't tell which of the donor's possibly multiple diseases were adjacent to the sample.
Thanks for the suggestion @NoopDog !
I've changed the disease_adjacent
to associated_diseases
which imports the disease module to describe the adjacent disease.
We already record the adjacent disease at the donor level as a part of our best practices, but it's useful to be able to record it at the specimen level more clearly.
@ESapenaVentura we don't have any precise guidelines for what qualifies as disease adjacent from the bionetworks. I would rely on what is stated in the paper to determine if a tissue is disease adjacent, but we can also reach out to the bionetworks and see if there's a consensus on distance
Having both diseases
and adjacent_diseases
introduces the possibility of both being populated. Is that intended and does it makes sense?
If we go this route and don't change anything in Azul, adjacent_diseases
would not be visible in the Data Browser or usable for filtering or sorting. Is it more likely that people who use the Data Browser to filter by a particular disease would expect the search result to include specimen from tissue adjacent to tissue affected by that disease, even if the tissue the specimen was collected from doesn't have that disease?
Having both diseases and adjacent_diseases introduces the possibility of both being populated. Is that intended and does it makes sense?
It makes sense to me that diseases
would always be populated with adjacent_diseases.
These are two different facts.
Current practice is:
We already record the adjacent disease at the donor level as a part of our best practices, but it's useful to be able to record it at the specimen level more clearly.
Of course, 1 can be inferred from 2 above but it offloads all clients from having to make this inference by making it explicit in the database, and to me, this seems desirable.
Hi @hannes-ucsc
Having both diseases and adjacent_diseases introduces the possibility of both being populated. Is that intended and does it makes sense?
Let's say a donor has a kidney tumor and they undergo nephrectomy to get rid of the tumor. This donor donates two tissue samples:
A. one from the tumor site
B. one 3cm away from the tumor.
We want to describe the specimens in this way:
A. disease
: tumor
B. disease
: normal adjacent_diseases
: tumor
This is to highlight that although the tissue is believed to be healthy it was taken from a site in proximity of a disease and could be affected by it.
You can see how in this scenario we expect that disease and adjacent_diseases might be both filled at the same time and it makes sense to do so.
We don't need adjacent_diseases
to be indexed, anyone who is interested in a specific disease can filter for it at the donor level and the adjacent_diseases specimens, although healthy, will be selected
You can see how in this scenario we expect that disease and adjacent_diseases might be both filled at the same time and it makes sense to do so.
Got it. Thanks.
We don't need adjacent_diseases to be indexed, anyone who is interested in a specific disease can filter for it at the donor level and the adjacent_diseases specimens, although healthy, will be selected
I see. Given that, Azul will ignore the adjacent_diseases
field. If a donor has multiple different tumors, and specimens were collected from tissue adjacent to one of these tumors, but no more, all of the donor's specimens (and the files derived from them) will match a filter that specifies only one of the tumor diseases. I think that's an acceptable conflation.
Release notes
For
specimen_from_organism.json
schema:disease_adjacent
Why are these changes needed?
This field is useful when determining the difference between 'normal' healthy tissue and 'disease-adjacent' tissue which often is affected by its proximity to the pathology area. It has been requested by the Skin Bionetworks, as this is of value to their metadata and analysis.
Reviews requested
This is a minor schema changes, all DCP2 reviewers need to review