biocaddie / prototype_issues

Used to report and track bioCADDIE prototype issues
3 stars 5 forks source link

Add New Fields To Sorting #72

Open saeid-p opened 8 years ago

saeid-p commented 8 years ago

This feedback has been collected from survey feedback questionnaire form. In order to protect user's privacy, the personal information has been removed.

Sorting should be expanded to author, by date published, repository, title and we should be able to select the number of results I want to see per page. Additionally, export functions to citation managers is a good feature to have.

saeid-p commented 8 years ago

In order to expand sorting to Author, Date Published, Repository, and Title, we need to have these fields in the mapping of all repositories in ElasticSearch. Currently, some of these fields are missing in some repositories and in some cases (Date Published), there is no consistency in the naming of the field. For instance, the publication date has been stored with different names in different repositories. The following are few instances:

Generally, in order to utilize ElasticSearch sorting functionality, we need to have a consistent field for each of these fields, with consistent naming and model in the index mapping. If repository doesn't provide the field, it can be stored without value, but the mapping of the field should be exist in all repositories.

One workaround to fix this issue is to re-sort the results after running the search. This approach has been explained here. However, this solution will impact the performance of the search and increases application response time.

@jgrethe what's your opinion about this issue?

jgrethe commented 8 years ago

Need to make sure that all of these are actually in the mapping.

saeid-p commented 8 years ago

@aegururaj I attached a list of missing or invalid fields in the current release.

bioCaddie Missing Fields.docx

saeid-p commented 8 years ago

Source: http://129.106.31.121:9200/_plugin/head/

Repository Title Date Published Author
ArrayExpress title (string) dateReleased (date) NOT FOUND
BioProject title (string) dateReleased (date) NOT FOUND
CIA title (string) [datasetdistribution].dateReleased (string) NOT FOUND
CIL title (string) NOT FOUND NOT FOUND
ClinicalTrials title (string) [datasetdistribution].dateReleased (string) creator (string)
CTN title (string) [datasetdistribution].dateReleased (date) creator (string)
CVRG title (string) [datasetdistribution].dateReleased (string) NOT FOUND
DataVerse title (string) dateReleased (string) NOT FOUND
Dryad title (string) [datasetdistribution].dateReleased (string) creator (string)
Gemma title (string) NOT FOUND NOT FOUND
Geo title (string) [datasetdistribution].dateReleased (string) NOT FOUND
Lincs title (string) dateReleased (string) NOT FOUND
MPD title (string) NOT FOUND NOT FOUND
Neuromorpho title (string) NOT FOUND NOT FOUND
Niddkcr title (string) NOT FOUND NOT FOUND
NursaDatasets title (string) NOT FOUND NOT FOUND
OpenFMRI title (string) NOT FOUND NOT FOUND
PDB title (string) dateReleased (date) citation.author (string)
Peptideatlas title (string) NOT FOUND NOT FOUND
dbGaP title (string) NOT FOUND NOT FOUND
Physiobank title (string) NOT FOUND NOT FOUND
ProteomExchange title (string) dateReleased (string) NOT FOUND
yped title (string) NOT FOUND NOT FOUND
jgrethe commented 8 years ago

@yul129 : can you verify that these have all been corrected for the upcoming data run.

yul129 commented 8 years ago

Here is the updated status of mapping the fields https://docs.google.com/spreadsheets/d/1I8Cr0IH5rmzVO9NzRGv7vLQbRTgQwXsb5iaO107-hms/edit#gid=0

RuilingLiu commented 8 years ago

Hi @yul129 ,

This document is not public. We need your permission to open it.

yul129 commented 8 years ago

I just updated the sharing setting, it should be public now.


From: RuilingLiu [notifications@github.com] Sent: Friday, August 19, 2016 2:34 PM To: biocaddie/prototype_issues Cc: Yueling Li; Mention Subject: Re: [biocaddie/prototype_issues] Add New Fields To Sorting (#72)

Hi @yul129https://github.com/yul129 ,

This is document is not public. We need your permission to open it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/biocaddie/prototype_issues/issues/72#issuecomment-241141165, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALPhs1nzvhOdUNS09yhBUbNgCfRQSKqXks5qhiFVgaJpZM4IEdaW.