RUC-MSc-CS-CIT-2024 / portfolio_subproject_1

Portfolio Subproject made for the CIT 2024 course
0 stars 0 forks source link

Domain model #1

Closed sofushn closed 1 month ago

sofushn commented 1 month ago

Here we should make a data model for the core functionality and the framework functionality.

Questions for status meetings

  1. What should we do about the type(sequel, prequel and alternative versions), the relation to it self (Media - Media) - NOT RELEVANT ISSUE

  2. Question for task D.11 to D.13 (what is posts? in 11 and 12 compared to words in 13) - POSTS SHOULD BE TITLE

    • D.11. Exact-match querying: Introduce an exact-match querying function that takes one or more keywords as arguments and returns posts that match all of these. Use the inverted index wi for this purpose. You can find inspiration on how to do that in the slides on Textual Data and IR.

    • D.12. Best-match querying: Develop a refined function similar to D.11, but now with a “bestmatch” ranking and ordering of objects in the answer. A best-match ranking simply means: the more keywords that match, the higher the rank. Titles in the answer should be ordered by decreasing rank. See also the Textual Data and IR slides for hints.

    • D.13. Word-to-words querying: An alternative, to providing search results as ranked lists of posts, is to provide answers in the form of ranked lists of words. These would then be weighted keyword lists with weights indicating relevance to the query. Develop functionality to provide such lists as answer. One option to do this is the following: 1) Evaluate the keyword query and derive the set of all matching titles, 2) count word frequencies over all matching titles (for all matching titles collect the words they are indexed by in the inverted index wi), 3) provide the most frequent words (in decreasing order) as an answer (the frequency is thus the weight here).

  3. Should we use all the data from the DB-all the attributes - NO

  4. Score history, revision this misunderstood this