Closed Pernilleww closed 3 years ago
Kilder:
Novel research works have introduced semantic techniques that shift from a keyword-based to a conceptbased representation of items and user profiles.
.... men vi starter nok med keywords, siden vi har keywords.
TODO-list
[ ] ~make sure documentIDs exist for all articles...~
[x] read in content-files and fields (see comment regarding fields)
[x] make a keywords corpora (all keywords in dataset) - this is also a big weakness of our approach (can't handle unseen keywords, at this point, but one approach could be expanding the matrices when new ones appear in real life)
[x] make a list of documentID <-> keywords
[x] make user-keywords matrix
[x] adjust weighting between categories and keywords. Categories should be lowered (e.g. x0.25), keywords multiplied up (x4)
[x] at this point basic CF should be possible
Getting advanced:
branch content_based
field=
id publishtime description teaser keyword #<[]> kw-classification kw-category kw-concept kw-company kw-entity #someway dirty: e.g. norge and norges are two different entities kw-location title score #some type of popularity for the article, read more about it
Use keywords from keywords, title, ingress
ADV: maybe named-entity-recognition