Nealcly / templateNER

Source code for template-based NER
210 stars 38 forks source link

Fix multiple occurrences of same string. #4

Closed parakalan closed 2 years ago

parakalan commented 3 years ago

Here -

for i in range(len(input_TXT_list)):
      words = []
      # for j in range(1, min(9, len(input_TXT_list) - i + 1)):
      for j in range(1, 9):
          word = (' ').join(input_TXT_list[i:i+j])
          words.append(word)
      print(words)

This is the output on calling the prediction function with previous version -

prediction("I made ghee from butter chilli")

['I', 'made', 'ghee', 'from', 'butter', 'chilli']
['I', 'I made', 'I made ghee', 'I made ghee from', 'I made ghee from butter', 'I made ghee from butter chilli', 'I made ghee from butter chilli', 'I made ghee from butter chilli']
['made', 'made ghee', 'made ghee from', 'made ghee from butter', 'made ghee from butter chilli', 'made ghee from butter chilli', 'made ghee from butter chilli', 'made ghee from butter chilli']
['ghee', 'ghee from', 'ghee from butter', 'ghee from butter chilli', 'ghee from butter chilli', 'ghee from butter chilli', 'ghee from butter chilli', 'ghee from butter chilli']
['from', 'from butter', 'from butter chilli', 'from butter chilli', 'from butter chilli', 'from butter chilli', 'from butter chilli', 'from butter chilli']
['butter', 'butter chilli', 'butter chilli', 'butter chilli', 'butter chilli', 'butter chilli', 'butter chilli', 'butter chilli']
['chilli', 'chilli', 'chilli', 'chilli', 'chilli', 'chilli', 'chilli', 'chilli']

After this fix, the output is -

['I', 'made', 'ghee', 'from', 'butter', 'chilli']
['I', 'I made', 'I made ghee', 'I made ghee from', 'I made ghee from butter', 'I made ghee from butter chilli']
['made', 'made ghee', 'made ghee from', 'made ghee from butter', 'made ghee from butter chilli']
['ghee', 'ghee from', 'ghee from butter', 'ghee from butter chilli']
['from', 'from butter', 'from butter chilli']
['butter', 'butter chilli']
['chilli']

Multiple occurrences of the same string happen when j > (len(input_TXT_list) - i), increasing computation time. This commit fixes this.