ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
90 stars 82 forks source link

Add geo_tag search feature on Search Bar #683

Closed ScarlettZ98 closed 4 years ago

ScarlettZ98 commented 5 years ago

Overview

To optimize the current search bar with geo_tag search. When the user inputs a keyword, and "incity:cityName", the frontend interface will only search and show twitters containing this keyword and locating in the specific city.

Plan

Future plan

Implement more tags and syntax for the search bar. e.g. boolean operations on keywords

Reference

Scala tutorial

lindashuuu commented 5 years ago

We try to implement the # and @ signs on the twittermap. In Asterixdb it will require much more work, so we decided to implement it in Elasticsearch version, which has a more powerful full-text search engine. Elasticsearch allows us to change the analyzers. A custom analyzer includes three components:

Functionalities:

Character filter : A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.

Tokenizer: A tokenizer receives a stream of characters, breaks it up into individual tokens.

Token filters: modify tokens. Eg. lowercase filter.

Initially we tried the whitespace tokenizer that simply breaks up the word stream by whitespace. But that method loses powerful functionalities because it’s too simple. The final solution is to use the standard tokenizer but transforms “@” and “#” into phrases. This method preserves the functionalities from the standard tokenizer and allows the users to search for account and hashtags. Plus It won’t affect the original words on the screen.

Code:

curl -X PUT "localhost:9200/twitter.ds_tweet" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
        "max_result_window": 2147483647
    },
    "analysis": {
      "analyzer": {
        "default": {
          "type" : "custom",
          "char_filter" : ["space_hashtags"],
          "tokenizer" : "standard",
          "filter" : ["lowercase","stop"]
        }
      },
      "char_filter" : {
            "space_hashtags" : {
                "type" : "mapping",
                "mappings" : ["#=>hashtagsign","@=>usermentionsign" ]
            }
        }
      }
    }
  }
'

Reference: https://www.elastic.co/blog/found-text-analysis-part-1

baiqiushi commented 4 years ago

Not needed anymore.