etherlabsio / ai-engine

Core AI services and functions powering the ETHER Platform
MIT License
0 stars 0 forks source link

Unfiltered keywords showing up #33

Closed shashankpr closed 5 years ago

shashankpr commented 5 years ago

Current behaviour

When displaying PIM keyphrases, some of the phrases are still unfiltered. For e.g choose you.and, interact.in etc.

Also, with the new profanity filter, some words may come up with * in them.

Problem

When pre-processing segments before populating graph, the segment is tokenized into sentences. The sentence tokenizer relies on proper formation of sentences wherein a sentence is followed by . and then a whitespace before beginning with new text in the next sentence. E.g

Let's have an interactive session on keyphrases and gitflow. Will discuss next steps as well.

In some deepgram transcripts, this does not happen. This makes the tokenizer consider it as a single word with . in between.

vdpappu commented 5 years ago

@shashankpr can you close this?