Closed zelima closed 6 years ago
Could we consider search
in dashboard
page? There is a situation when you cannot find dataset on users dashboard's search box, but it can be found on the main search page. Is this related to this issue or i need to create separate issue?
@Mikanebu no this is not related. Please open separated issue in frontend repo
FIXED.
Finance and climate present neither in title nor in README so none of them result as expected. That may be probably solved with keywords. But not related to this issue
As a Consumer looking for data I want to get a dataset on topic X so that I can use it for my work
As a Consumer looking for data I want to be able to search with relevant terms and see if there are datasets available that are related
What's the situation now:
Acceptance Criteria
Search for finance brings up vix? Search for climate brings up co2-ppmTasks
Modifying the frontend (?): nothing to dotableschema_elasticsearch
) https://github.com/datahq/specstore/issues/28Analysis
Search system has several parts:
Problem: we don't index the readme atm so we can't search it
=> to change indexing needs changing mapping (or moving the readme into datahub.description) => either of these need a reload of the datapackages => editing dump to s3 assembler + a rerun of all flows [painful and complex] => a deeper analysis of the issue => should we re-architect a bit
Solutions:
Questions
How the load to metastore works today
2 parts
Where and when does ES index get set up?
Adding documents
Convert this data package into a new "data package" DP2 - https://github.com/datahq/assembler/blob/master/datapackage_pipelines_assembler/processors/dump_to_s3.py
DP2: has a single resource with one row that is
'id', 'name', 'title', 'description', # readme was not pulled out into it dp.readme or dp.description 'datahub' 'datapackage': # descriptor minus resources
and a schema that is the ES schema
what's not working about this
How to re-architect
Do we push notify the ES index system or does the index system listen?
Requirements:
Going forward
https://github.com/datahq/assembler/blob/master/datapackage_pipelines_assembler/processors/dump_to_s3.py#L17
Repos to work with: