Closed hannes-ucsc closed 5 years ago
No organ is displayed for 1 FOV BaristaSeq mouse SpaceTx dataset even though it is filled out in the spreadsheet - this is likely because the way organ is assigned is linked to cell suspension that is not relevant to imaging datasets. For the imaging datasets the relevant biomaterial is "imaged_specimen". If that's added into the algorithm of how Organ (organ part) is determined this should solve the problem.
The library construction method is irrelevant for imaging datasets, the imaging method should be displayed instead (imaging_protocol.target.assay_type.text or imaging_protocol.target.assay_type.ontology_label)
I've asked @danielsotirhos to reindex the samples
deployment against DSS staging because I think it will address the organ issue. Once reindexing is done, this link
should 1) have a hit and 2) should list the organ "brain" under specimens.
We also need to add support for imaging_protocol.target[*].assay type
. However, @zperova, in the imaging bundle I am looking at, there are dozens of targets under target
. Each one has it's own assay_type
field. In that bundle these fields all have the same text value ("in situ sequencing"
) but I fear that there might also be many different values. The data browser can't display an indeterminate number of values in one column. How should we handle that?
@hannes-ucsc thanks, I can see brain
now.
for the second part - at the moment the assay_type
is the same for all targets, but when we get to more complex datasets, it might change. Am I correct to think that putting an assay_type field in the Imaging Protocol would solve this issue?
@zperova, if the assay type could potentially be different between targets, pulling that property up into the parent imaging_protocol
entity wouldn't work. In that case we'll just have to accumulate all imaging_protocol.target[*].assay_type
values into a weighted and bounded set. We would index and display the, say, 10 most frequently used assay types for each protocol. Would that work?
@hannes-ucsc why wouldn't it work to have the assaytype at the imaging(preparation)protocol level? The imaging(preparation)_protocol describes each of the protocols used, so if there are two different assay_types used in the experiment, these can be pulled from there to display without any need of accumulation of values (which I think is a more complicated task). I am thinking along the lines of what is done with the library_construction_method. These are pulled from library_preparation_protocol.library_construction_method.ontology_label so I propose to do the same to assay_type. Or am I wrong?
I guess I don't understand. There must be a reason why assay_type
is a property of imaging_protocol.target
rather than just imaging_protocol
. If we pull it up into imaging_protocol
then we'd have to make imaging_protocol.assay_type
an array and we'd lose the association with target
. I have no idea what the right way is. You tell me.
I'm just saying that if 1) there can be many targets and 2) each target could potentially use a different assay type that would imply that there could be many distinct assay types and we'd have to apply some sort of upper bound on the number of assay types because the data browser can't display an arbitrarily large number of values in a single table cell.
Likewise, if we pulled assay type up into imaging protocol, we could still have many assay types (yes?) and we would also need to apply an upper bound.
@zperova we are moving forward with indexing the N most frequent assay types from each target
in imaging_protocol.targets
. When we aggregate multiple imaging_protocol
instances—for example to summarize them per project in the Projects tab—we'll take the top M most frequent assay types. Interestingly, this is an approximate process. I can elaborate why if needed. To jog my memory:
N=1, M=2
{x,x,a} => {x:2} {y,y,a} => {y:2} {y,y,a} => {y:2}
[{x:2}, {y:2}, {y:2}] becomes {y:4, x:2} but {y:4, a:3} would be more accurate.
We've also decided to not index other properties from imaged_specimen
and image_file
until we're being explicitly asked to expose particular properties.
The imaging data set is still displayed with the wrong organ but that is due to https://github.com/HumanCellAtlas/data-browser/issues/640. Azul already indexes the organ correctly, going up in the graph through imaged_specimen (instead of cell_suspension) to specimen_from_organism.
I guess I don't understand. There must be a reason why
assay_type
is a property ofimaging_protocol.target
rather than justimaging_protocol
. If we pull it up intoimaging_protocol
then we'd have to makeimaging_protocol.assay_type
an array and we'd lose the association withtarget
. I have no idea what the right way is. You tell me.
You are right - that's the reason we had the target module in the first place. The important part is for a dataset to be identified in the Browser when someone searches a particular assay type. It is my understanding that your approach will accomplish that.
The important part is for a dataset to be identified in the Browser when someone searches a particular assay type. It is my understanding that your approach will accomplish that.
Not exactly. If the there are 101 distinct assay types spread over say 1000 target objects we will discard one assay type—the least frequently used one—and the user will not be able to find the dataset by that assay type. However, we can change the thresholds. If you think 100 is too low let me know.
@hannes-ucsc since it is very unlikely that there will be a large number of assay-types per dataset, they all should be represented in the Browser with the threshold of 100.
Entities of interest are
imaged_specimen
andimage_file
.┆Issue is synchronized with this Jira Story ┆Project Name: azul ┆Issue Number: AZUL-549 ┆Epic: Imaging support