Currently, our script for connecting scraped facilities to homeland security facilities is making some bad decisions about lumping facilities together. We can do a better job with this by doing a better job of removing dataset-specific stopwords. E.g. in the below example, we should not be checking for string similarity on the stopwords 'detention' 'center'
Currently, our script for connecting scraped facilities to homeland security facilities is making some bad decisions about lumping facilities together. We can do a better job with this by doing a better job of removing dataset-specific stopwords. E.g. in the below example, we should not be checking for string similarity on the stopwords 'detention' 'center'