geekusa / nlp-text-analytics

13 stars 6 forks source link

Removal of identical data in similarity command #6

Closed pa1007 closed 2 years ago

pa1007 commented 2 years ago

Here is a PR for issue #5,

Before : Columns top_match_target and top_match_source present for muti-to-multi output Column top_match_target and present for single-to-multi output Neither column top_match_target or top_match_source present for single-to-single output

All these random present or not columns are a pain to automate if we want to use this command on a huge output of text-to-text comparison with a random number of comparisons to do.

So with this PR, I propose that we have a similar output for each possible output "type" so

After : Columns top_match_target and top_match_source present for muti-to-multi output Columns top_match_target and top_match_source present for single-to-multi output Columns top_match_target and top_match_source present for single-to-single output