When displaying PIM keyphrases, some of the phrases are still unfiltered. For e.g choose you.and, interact.in etc.
Also, with the new profanity filter, some words may come up with * in them.
Problem
When pre-processing segments before populating graph, the segment is tokenized into sentences. The sentence tokenizer relies on proper formation of sentences wherein a sentence is followed by . and then a whitespace before beginning with new text in the next sentence.
E.g
Let's have an interactive session on keyphrases and gitflow. Will discuss next steps as well.
In some deepgram transcripts, this does not happen. This makes the tokenizer consider it as a single word with . in between.
Current behaviour
When displaying PIM keyphrases, some of the phrases are still unfiltered. For e.g
choose you.and
,interact.in
etc.Also, with the new profanity filter, some words may come up with
*
in them.Problem
When pre-processing segments before populating graph, the segment is tokenized into sentences. The sentence tokenizer relies on proper formation of sentences wherein a sentence is followed by
.
and then a whitespace before beginning with new text in the next sentence. E.gIn some deepgram transcripts, this does not happen. This makes the tokenizer consider it as a single word with
.
in between.