Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
134 stars 34 forks source link

Column 3 accepts (case-insensitive): [true, false, 0, 1], extend this to other cases: "Correct" "Wrong" #54

Open kasra-hosseini opened 4 years ago

kasra-hosseini commented 4 years ago

Extend list of accepted values for positive matches.

Change data_processing.py (see in particular lines 37-43, but you may have to do other changes in subsequent lines) so it also accepts, positive, negative, correct, and wrong:

  for i in range(len(df_list)):
    tmp_split_row = df_list[i].split(csv_sep)
    if str(tmp_split_row[2]).strip().lower() not in ["true", "false", "1", "0"]:
      print(f"SKIP: {df_list[i]}")
      # change the label to remove_me,
      # we drop the rows with no true|false in the label column
      tmp_split_row = f"X{csv_sep}X{csv_sep}remove_me".split(csv_sep)