Removal of identical data in similarity command

Here is a PR for issue #5,

I have changed the if statement to use the new argument in the command similarity (remove_duplicates)
Homogeneity in the command output: We didn't have all the columns present in the multi-to-multi output for single-to-single or single-to-multi output :

Before : Columns top_match_target and top_match_source present for muti-to-multi output Column top_match_target and present for single-to-multi output Neither column top_match_target or top_match_source present for single-to-single output

All these random present or not columns are a pain to automate if we want to use this command on a huge output of text-to-text comparison with a random number of comparisons to do.

So with this PR, I propose that we have a similar output for each possible output "type" so

After : Columns top_match_target and top_match_source present for muti-to-multi output Columns top_match_target and top_match_source present for single-to-multi output Columns top_match_target and top_match_source present for single-to-single output

Added an option in the similarity view for removing or not duplicates in the data

geekusa / nlp-text-analytics

Removal of identical data in similarity command #6