Description
As of now, the Self-Attention Mechanism is using a simple Cosine Similarity/Dot Product Similarity Algorithm with a selectivity threshold. This is proven to be very inaccurate and cannot be reverse searched as the incorrect selections will invalidate the integrity of the database.
To Reproduce
To reproduce the behavior:
from MAGIST.NLP.SelfAttention import TextPreprocessing
t = TextPreprocessing("config.json")
out = t.__call__("Hello, my name is John. I am a dummy script.")
for i in out:
print(i)
Output
[5.967583262815608, 'hello', 'Good']
[3.7432159225461947, 'my', 'Not']
[2.520566459677965, 'name', 'Not'] ---> Incorrect; This should be "Good"
[5.6983463875519735, 'is', 'Not']
[4.848795399908668, 'john', 'Not']
[6.083478457022617, 'i', 'Good']
[9.443521265161667, 'am', 'Good']
[8.284217064260607, 'a', 'Good'] ---> Incorrect; This should be "Not"
[8.485852410408823, 'dummy', 'Good']
[2.466104715281189, 'script', 'Not'] ---> Incorrect; This should be "Good"
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
This was expected since this algorithm is very primitive. Perhaps, a better positional embedding or an end-to-end LSTM-Dense neural network would improve its performance.
Description As of now, the Self-Attention Mechanism is using a simple Cosine Similarity/Dot Product Similarity Algorithm with a selectivity threshold. This is proven to be very inaccurate and cannot be reverse searched as the incorrect selections will invalidate the integrity of the database.
To Reproduce To reproduce the behavior:
Output
Expected behavior A clear and concise description of what you expected to happen.
Additional context This was expected since this algorithm is very primitive. Perhaps, a better positional embedding or an end-to-end LSTM-Dense neural network would improve its performance.