NCBI-Hackathons / Metadata_categorization

A crowdsourcing/expert curation platform for metadata categorization.
Creative Commons Zero v1.0 Universal
5 stars 0 forks source link

Global Alphabetization #9

Open DCGenomics opened 8 years ago

DCGenomics commented 8 years ago

For example, all HEK293 variants need to be on the same line

eweitz commented 8 years ago

@lepons seems to have this mostly working as expected. Some background begins at https://github.com/NCBI-Hackathons/Metadata_categorization/issues/5#issuecomment-184471628.

Lena, one slight anomaly I see is that, in the annotation Solr core, HEK293 is all in queue 4, whereas HEK293T is in queue 190 and queue 236. Any idea why HEK293 and HEK293T are so far apart, and why HEK239T is in non-adjacent queues?

http://localhost:8983/solr/annotation/select?q=sourceCellLine:HEK293&wt=json&indent=true http://localhost:8983/solr/annotation/select?q=sourceCellLine:HEK293T&wt=json&indent=true

Clustering is much better than before regardless. For example, docs with sourceCellLine: HEK293 are now all in the same queue. And most of the docs / individual records with the same queueId and sourceCellLine are on the same line in the UI (i.e. in the summary record), too. I'm looking into why only most and not all such records are in the same summary record. I suspect the cause is in the Django backend.