Mester / demo-day-vikings

The Unlicense
2 stars 1 forks source link

Logic, workflow, architecture, etc. #67

Closed tylerphillips55 closed 8 years ago

tylerphillips55 commented 8 years ago

https://files.slack.com/files-tmb/T035C1NSH-F1NANCA75-f8dd31675e/img_20160702_235741_1024.jpg

Above is @neo1691 's sketch. It is a good idea to update our db every (period of time). For results returned by the hot filter, about 1000 posts are submitted in a week or 150 a day or 7 posts an hour. We should consider this when deciding how often to update our db. However, if we continue to only return a list of 10 songs with the highest scores, we can update less frequently as newer posts most likely wouldn't be included anyway due to low scores. But, keep in mind that if we choose to implement a search-by-date option, update frequency will have to be increased.

Also, we must consider the case of <10 results for a genre. This slightly complicates things. We must find a balance between returning songs with highest scores and those submitted more recently. What I mean is, to fulfill the missing number of songs, we would select the last x songs for the genre, which would be, at least as of now, selected based on score. But if a post is very old, despite it having a high score, I don't think we should necessarily select it over a newer post with a lower score. We need to keep a minimum number of songs for each genre in the db that were selected per rules to ensure not only high scoring posts are selected but timely posts. If not, the posts will be returned very often, despite new submissions or a genre. If we implement the search-by-date feature, we can use this logic and use absolute scores if not searched by date.

By the way, reddit's hot algorithm ranks new stories higher. Posts are highly affected by submission time. A post with 3000 upvotes can have a 6% higher score if posted tomorrow rather than today.

anubhavcodes commented 8 years ago

Yes this makes absolute sense.

anubhavcodes commented 8 years ago

Here is some changes that I propose

1) get the json from the reddit link 2) use the regexes that we have to get valid post 3) Create an example json like this:

screen shot 2016-07-03 at 12 28 30 am

4) Insert into DB 5) In another universe, when someone ask's for a genre, read the db and return the results.

anubhavcodes commented 8 years ago

The Post class is just an overhead, it does not have any blueprint of members that it is supposed to contain. It just uses the classmethod to create the members at runtime, so there is no object oriented concept in place here. The problem is the views.py expects us to pass the html with instances of Post class. So either we get rid of that class and make heavy changes in the html, or just reuse the Post class and create Post objects from the json.db search results.

anubhavcodes commented 8 years ago

This is a dup of #64