Unfiltered keywords showing up

Current behaviour

When displaying PIM keyphrases, some of the phrases are still unfiltered. For e.g choose you.and, interact.in etc.

Also, with the new profanity filter, some words may come up with * in them.

Problem

When pre-processing segments before populating graph, the segment is tokenized into sentences. The sentence tokenizer relies on proper formation of sentences wherein a sentence is followed by . and then a whitespace before beginning with new text in the next sentence. E.g

Let's have an interactive session on keyphrases and gitflow. Will discuss next steps as well.

In some deepgram transcripts, this does not happen. This makes the tokenizer consider it as a single word with . in between.

etherlabsio / ai-engine

Unfiltered keywords showing up #33

Current behaviour

Problem