NLP-in-the-Social-Sciences / Reddit-Data-Pipeline

Code and data we are using for facilitating an ETL pipeline for Low SES research
GNU General Public License v3.0
0 stars 1 forks source link

Add comparison logic to `get_nns_by_vector` #24

Open MoRevolution opened 1 year ago

MoRevolution commented 1 year ago

The distance retrieval for 16 narratives all at once is a bit too general. Use the "Comparative Filtering" cell to get an idea of how the logic is supposed to work. It's essentially the same as the keyword filtering that we did.

MoRevolution commented 1 year ago

Some of the error caused by this bug was in issue #25 which is now fixed. The join made it so that a series of sentences that didn't end with periods, were never lemmatized.