Tilana / Classification

3 stars 1 forks source link

improve the suggestion of similiar sentences with USE #97

Open Tilana opened 6 years ago

Tilana commented 6 years ago

The idea of using the Universal Sentence Encoder to gather enough training data for CNN classification only makes sense if the documents that are analyzed might include relevant sentences. If the collection is very big and has lots of categories is is likely that the 10 documents used to find similar sentences are not related, so no suggestions are made, so the size of the training data is not increased...

Tilana commented 6 years ago

It seems that the 10 documents taken to identify similar sentences sometimes repeat which reduces the probability of finding relevant but different evidence sentences while requiring unnecessary computation time selection_004

Tilana commented 6 years ago

bug of presenting the same documents multiple times is fixed but investigate issue from Uwazi API