Bergvca / string_grouper

Super Fast String Matching in Python
MIT License
364 stars 76 forks source link

[question] Partial matching of strings #57

Open andrei-volkau opened 3 years ago

andrei-volkau commented 3 years ago

Goal: The goal is to group the following strings into the same group.

Should you raise an adverse event in a specific patient? Although what you say will be treated in confidence, should you raise an adverse event in a specific patient?

The following code creates separate groups for those strings.

string_grouper = StringGrouper(question_table["question"])
string_grouper = string_grouper.fit()
question_table["labels"] = string_grouper.get_groups()

Question: is it possible to adjust string matching to reach the goal?

Thank you in advance for any hints!

ParticularMiner commented 3 years ago

Hi @andrei-volkau

Simply lower the similarity-threshold (the default is 0.8). For example, you could try the following:

string_grouper = StringGrouper(question_table["question"], min_similarity=0.5)
string_grouper = string_grouper.fit()
question_table["labels"] = string_grouper.get_groups()

Continue lowering if it doesn't work.

Goal: The goal is to group the following strings into the same group.

Should you raise an adverse event in a specific patient?

Although what you say will be treated in confidence, should you raise an adverse event in a specific patient?

The following code creates separate groups for those strings.


string_grouper = StringGrouper(question_table["question"])

string_grouper = string_grouper.fit()

question_table["labels"] = string_grouper.get_groups()

Question: is it possible to adjust string matching to reach the goal?

Thank you in advance for any hints!

ParticularMiner commented 3 years ago

Notify: @andrei-volkau

There are a few more options described in the README (follow the link).

Hi @andrei-volkau

Simply lower the similarity-threshold (the default is 0.8). For example, you could try the following:

string_grouper = StringGrouper(question_table["question"], min_similarity=0.5)
string_grouper = string_grouper.fit()
question_table["labels"] = string_grouper.get_groups()

Continue lowering if it doesn't work.

Goal: The goal is to group the following strings into the same group.

Should you raise an adverse event in a specific patient?

Although what you say will be treated in confidence, should you raise an adverse event in a specific patient?

The following code creates separate groups for those strings.


string_grouper = StringGrouper(question_table["question"])

string_grouper = string_grouper.fit()

question_table["labels"] = string_grouper.get_groups()

Question: is it possible to adjust string matching to reach the goal?

Thank you in advance for any hints!