Closed eweitz closed 8 years ago
latest fixes to /annotation core:
It seems the "annotation" core has regressed. annotDisease
is missing in Solr documents from that core:
From http://localhost:8983/solr/annotation/select?q=_%3A_&wt=json&indent=true
{
"queueId":4,
"taxId":9606,
"note":"0",
"id":"2712518",
"sourceCellLine":"Sample from Homo sapiens",
"sampleName":"0",
"sampleTitle":"0",
"sourceCellType":"0",
"sourceSpecies":"Homo Sapiens",
"sourceAnatomy":"0",
"sourceDisease":"0",
"sourceCellTreatment":"0",
"sourceSex":"0",
"annotSex":"0",
"annotCellLine":"0",
"annotCellType":"0",
"annotSpecies":"0",
"annotAnatomy":"0",
"annotCellTreatment":"0",
"sourceDevStage":"0",
"annotDevStage":"0",
"_version_":1526267270228082688},
Same for AnnotationsDev
; see http://localhost:8983/solr/AnnotationsDev/select?q=_%3A_&wt=json&indent=true.
the samples in AnnotationsDev for human are not in great shape - there's lots of layers of stuff that I never refreshed after I ran tests into it, so probably some of them have them in there right. I'm going to take down annotation and repopulate - it's just bad versioning on my part that when I populated it today that I introduced this other problem. annotation should be back up in an hour.
FWIW, if you wanted to see a test case of the clustering approach, in AnnotationsDev, queue 81 has all the HEK239 samples, they're not disambiguated, they are just the free text matches, but seemed like a better place to start than nothing.
On Mon, Feb 15, 2016 at 8:40 PM, Eric Weitz notifications@github.com wrote:
It seems the "annotation" core has regressed. annotDisease is missing in Solr documents from that core:
From http://localhost:8983/solr/annotation/select?q=_%3A_ &wt=json&indent=true
{ "queueId":4, "taxId":9606, "note":"0", "id":"2712518", "sourceCellLine":"Sample from Homo sapiens", "sampleName":"0", "sampleTitle":"0", "sourceCellType":"0", "sourceSpecies":"Homo Sapiens", "sourceAnatomy":"0", "sourceDisease":"0", "sourceCellTreatment":"0", "sourceSex":"0", "annotSex":"0", "annotCellLine":"0", "annotCellType":"0", "annotSpecies":"0", "annotAnatomy":"0", "annotCellTreatment":"0", "sourceDevStage":"0", "annotDevStage":"0", "version":1526267270228082688},
Same for AnnotationsDev; see http://localhost:8983/solr/AnnotationsDev/select?q=_%3A_&wt=json&indent=true.
— Reply to this email directly or view it on GitHub https://github.com/NCBI-Hackathons/Metadata_categorization/issues/5#issuecomment-184469362 .
Annotation is repopulated. I'm going to apply the sorting approach to HEK239 for annotation as well, and then it should be the same, in queue 81, you should see clustered HEK239 samples that all appear in that queue.
On Mon, Feb 15, 2016 at 8:47 PM, Lena Pons lenabethpons@gmail.com wrote:
the samples in AnnotationsDev for human are not in great shape - there's lots of layers of stuff that I never refreshed after I ran tests into it, so probably some of them have them in there right. I'm going to take down annotation and repopulate - it's just bad versioning on my part that when I populated it today that I introduced this other problem. annotation should be back up in an hour.
FWIW, if you wanted to see a test case of the clustering approach, in AnnotationsDev, queue 81 has all the HEK239 samples, they're not disambiguated, they are just the free text matches, but seemed like a better place to start than nothing.
On Mon, Feb 15, 2016 at 8:40 PM, Eric Weitz notifications@github.com wrote:
It seems the "annotation" core has regressed. annotDisease is missing in Solr documents from that core:
From http://localhost:8983/solr/annotation/select?q=_%3A_ &wt=json&indent=true
{ "queueId":4, "taxId":9606, "note":"0", "id":"2712518", "sourceCellLine":"Sample from Homo sapiens", "sampleName":"0", "sampleTitle":"0", "sourceCellType":"0", "sourceSpecies":"Homo Sapiens", "sourceAnatomy":"0", "sourceDisease":"0", "sourceCellTreatment":"0", "sourceSex":"0", "annotSex":"0", "annotCellLine":"0", "annotCellType":"0", "annotSpecies":"0", "annotAnatomy":"0", "annotCellTreatment":"0", "sourceDevStage":"0", "annotDevStage":"0", "version":1526267270228082688},
Same for AnnotationsDev; see http://localhost:8983/solr/AnnotationsDev/select?q=_%3A_&wt=json&indent=true.
— Reply to this email directly or view it on GitHub https://github.com/NCBI-Hackathons/Metadata_categorization/issues/5#issuecomment-184469362 .
The HEK293 cluster is in queue 4. You can view it http://localhost:8983/solr/annotation/select?q=queueId%3A4+AND+HEK293&wt=json&indent=true
"response": { "numFound": 652, "start": 0, "docs": [ { "queueId": 4, "id": "2147292", "taxId": 9606, "sourceCellLine": "HEK293 GMUCT B", "sampleName": "0", "sampleTitle": "0", "sourceCellType": "HEK293 cells", "sourceSpecies": "Homo Sapiens", "sourceAnatomy": "0", "sourceDisease": "0", "annotCellLine": "0", "annotCellType": "0", "annotSpecies": "0", "annotAnatomy": "0", " annotDisease": "0", "annotCellTreatment": "0", "note": "0", "version": 1526295291084406800 }, { "queueId": 4, "id": "2147291", "taxId": 9606, " sourceCellLine": "HEK293 GMUCT A", "sampleName": "0", "sampleTitle": "0", " sourceCellType": "HEK293 cells", "sourceSpecies": "Homo Sapiens", " sourceAnatomy": "0", "sourceDisease": "0", "annotCellLine": "0", " annotCellType": "0", "annotSpecies": "0", "annotAnatomy": "0", "annotDisease": "0", "annotCellTreatment": "0", "note": "0", "version": 1526295291092795400 }, { "queueId": 4, "id": "3301901", "taxId": 9606, " sourceCellLine": "HEK293", "sampleName": "0", "sampleTitle": "0", " sourceCellType": "0", "sourceSpecies": "Homo Sapiens", "sourceAnatomy": "missing", "sourceDisease": "0", "annotCellLine": "0", "annotCellType": "0", "annotSpecies": "0", "annotAnatomy": "0", "annotDisease": "0", " annotCellTreatment": "0", "note": "0", "version": 1526295291101184000 },
On Mon, Feb 15, 2016 at 9:09 PM, Lena Pons lenabethpons@gmail.com wrote:
Annotation is repopulated. I'm going to apply the sorting approach to HEK239 for annotation as well, and then it should be the same, in queue 81, you should see clustered HEK239 samples that all appear in that queue.
On Mon, Feb 15, 2016 at 8:47 PM, Lena Pons lenabethpons@gmail.com wrote:
the samples in AnnotationsDev for human are not in great shape - there's lots of layers of stuff that I never refreshed after I ran tests into it, so probably some of them have them in there right. I'm going to take down annotation and repopulate - it's just bad versioning on my part that when I populated it today that I introduced this other problem. annotation should be back up in an hour.
FWIW, if you wanted to see a test case of the clustering approach, in AnnotationsDev, queue 81 has all the HEK239 samples, they're not disambiguated, they are just the free text matches, but seemed like a better place to start than nothing.
On Mon, Feb 15, 2016 at 8:40 PM, Eric Weitz notifications@github.com wrote:
It seems the "annotation" core has regressed. annotDisease is missing in Solr documents from that core:
From http://localhost:8983/solr/annotation/select?q=_%3A_ &wt=json&indent=true
{ "queueId":4, "taxId":9606, "note":"0", "id":"2712518", "sourceCellLine":"Sample from Homo sapiens", "sampleName":"0", "sampleTitle":"0", "sourceCellType":"0", "sourceSpecies":"Homo Sapiens", "sourceAnatomy":"0", "sourceDisease":"0", "sourceCellTreatment":"0", "sourceSex":"0", "annotSex":"0", "annotCellLine":"0", "annotCellType":"0", "annotSpecies":"0", "annotAnatomy":"0", "annotCellTreatment":"0", "sourceDevStage":"0", "annotDevStage":"0", "version":1526267270228082688},
Same for AnnotationsDev; see http://localhost:8983/solr/AnnotationsDev/select?q=_%3A_&wt=json&indent=true.
— Reply to this email directly or view it on GitHub https://github.com/NCBI-Hackathons/Metadata_categorization/issues/5#issuecomment-184469362 .
Awesome, annotation
is back up with the fixed annotDisease
field.
And at a glance, the sourceCellLine
clustering looks much better than before in annotation
. The clustering seems to be similar in AnnotationsDev
.
It's just one cluster but I was wondering if I populated annotationsDev the last time from the wrong (read unsorted) file. On Feb 15, 2016 9:43 PM, "Eric Weitz" notifications@github.com wrote:
Awesome, annotation is back up with the fixed annotDisease field.
And at a glance, the sourceCellLine clustering looks much better than before in annotation. The clustering seems to be similar in AnnotationsDev .
— Reply to this email directly or view it on GitHub https://github.com/NCBI-Hackathons/Metadata_categorization/issues/5#issuecomment-184485770 .
This was fixed again a few days ago.
Each BioSample record in our application is represented as a Solr document. Each underlying field in those documents typically has a "source" value and a corresponding "annotated" value, e.g.
sourceCellLine
andannotCellLine
.However, the
sourceDisease
field is missing a correspondingannotDisease
field.http://localhost:8983/solr/annotation/select?q=id%3A3274314%0A&wt=json&indent=true
The doc above should contain a field
annotDisease
.(Also,
sourceDisease
should follow our convention of indicating empty fields via"0"
, instead of" "
Finding a better way to represent empty fields would be nice, but is a separate issue.)