f4bD3v / humanitas

A price prediction toolset for developing countries
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

Proc. tweets - matching to predefined categories #25

Closed f4bD3v closed 10 years ago

f4bD3v commented 10 years ago

Additional words for general food category: 'snack', 'rice'?, 'groceries', 'cook'

work in progress guys, we have to make absolutely sure we get all the tweets through filtering. I think we could still refine our approachThis pattern library is really powerful and we could run it on the tweets we write to the databaseFor filtering, we should use both suggestion as well as edit distance and PoS-tagging keep associated PoS tags for all keywords if a word doesn't fit any keyword, compute a suggestion and check if PoS tags match and edit distance threshold (to keyword) is satisfied, if no suggestion available just check edit distance and compare PoS tags