clvrobj / Pinboard-Plus

Pinboard.in Chrome extension, not official.
https://chrome.google.com/webstore/detail/mphdppdgoagghpmmhodmfajjlloijnbd
Other
144 stars 36 forks source link

Feature Add: utilizing local meta keywords tags to suggest better and more relevant tags #47

Closed ventz closed 6 years ago

ventz commented 6 years ago

Feature Add: utilizing local meta keywords tags to suggest better and more relevant tags.

PR includes:

1.) New content script: keywords_suggestions.js with a getKeywordsSuggestionTags listener (utilized for pulling out "meta keywords" content from websites

2.) updated manifest.json to include new content script allowance for keywords_suggestions.js

3.) Update to background.js to include optional parameter to pass local meta keywords, adding them to suggests, and de-dupping with popular and recommended tags.

4.) Updated to popups.js to include new getKeywordsSuggestionTags tab message sender (to our getKeywordsSuggestionTags listener from the keywords_suggestions.js), which either overloads getSuggests (in the background.js) with the local meta keywords, or if none are present, passes along the Pinboard suggestions.

Examples/Test with:

clvrobj commented 6 years ago

Hi @ventz , Thanks for the pull request!

ventz commented 6 years ago

@clvrobj No problem.

If you can, please test it on your side - I started with the chrome extension at first, cloned it, and changed it there and got it to work, only to find your github repository :) I then modified it for here, following some of the names/conventions you have setup. My local version works (when I point chrome to the git working directory) -- but when I did the gulp build and took the dist bundle, it had display issues upon logging into the pinboard account. I think this might be due to some out of date/deprecated packages however.

clvrobj commented 6 years ago

Hi @ventz I tried it and it works, but the tags seems a bit too many in the page. Pls check the below screenshot.

screenshot 2018-10-16 21 49 11

I am thinking whether we can improve it by better sorting and show less of them. Currently, the keywords are sorted by alphabets, I think insteadly it might be better to sort the keywords by the showing times and only display the top 10 or 5 keywords, then the most relavent ones will be shown in the page, what do you think? Just did a quick test on the keywords of NYtimes using the solution, the result: [('and', 7), ('US', 3), ('Saudi', 3), ('Politics', 2), ('Arabia', 2), ('bin', 2), ('2016', 2), ('Charlottesville', 2), ('violence', 2), ('Restaurant', 2), ('Foreign', 1), ('Policy', 1), ('', 1), ('Iran', 1), ('Jamal', 1), ('Khashoggi', 1), ('Murders', 1), ('Homicides', 1), ('State', 1), ('Department', 1), ('King', 1), ('of', 1), ('Salman;Salman', 1), ('Abdul-Aziz', 1), ('Mike', 1), ('Pompeo', 1), ('Mohammed', 1), ('Salman', 1), ('Turkey', 1), ('Oil', 1), ('Gasoline', 1), ('Aramco', 1), ('Donald', 1), ('Trump', 1), ('House', 1), ('Committee', 1), ('on', 1), ('Ways', 1), ('Means', 1), ('Democrats', 1), ('Federal', 1), ('Taxes', 1), ('Russian', 1), ('meddling', 1), ('in', 1), ('election', 1), ('Teapot', 1), ('Dome', 1), ('Women', 1), ('Girls', 1), ('Sexual', 1), ('harassment', 1), ('Brett', 1), ('M', 1), ('Kavanaugh', 1), ('Abortion', 1), ('Missouri', 1), ('Claire', 1), ('McCaskill', 1), ('Presidential', 1), ('Election', 1), ('Blacks;African-Americans', 1), ('Tests', 1), ('School', 1), ('discipline', 1), ('K-12', 1), ('Education', 1), ('Race', 1), ('Ethnicity', 1), ('Segregation', 1), ('Equal', 1), ('Educational', 1), ('Opportunities', 1), ('VA', 1), ('Marijuana;Pot;Weed', 1), ('Canada', 1), ('Cannabis', 1), ('Foods', 1), ('Products', 1), ('Medical', 1), ('Marijuana', 1), ('Regulation', 1), ('Deregulation', 1), ('Legislation', 1), ('MeToo', 1), ('Movement', 1), ('Chef', 1), ('Spotted', 1), ('Pig', 1), ('Manhattan', 1), ('NY', 1), ('April', 1), ('Bloomfield', 1), ('Ken', 1), ('Friedman', 1), ('Domestic', 1), ('Moira', 1), ('Donegan', 1), ('Stephen', 1), ('Elliott', 1), ('Lawsuits', 1)], so we might also could filter out the and, of etc, or we can also do the filter in the next step.

ventz commented 6 years ago

Hi,

The NYT is sadly one of the places where the keywords are a bit much. and honestly - in my opinion at least, they are probably not that helpful there. They basically don't use them properly. But in most other places I think they are pretty good (the ones that still use them - as google has stopped looking at meta tags, some websites have stopped bothering with meta data). Usually websites keep less than 10 of them, and they are directly related to the website itself. (ex: see cnn.com - although they filter their traffic so sometimes they are not available to the scripts pulling the metadata).

Maybe as a solution the array length can be checked -- if it's over (picking an arbitrary number here) 10 for the ones coming from the meta keywords, we can then pull a keyword relevancy of some sort? (At that point, there's even the potential to do some keyword analysis from the html content itself :)) Esemtially this: https://www.rankranger.com/documentation/on-page-keyword-density-analysis

I would also lower case the tags by the way, otherwise there's some de-dups there and in general it probably makes sense to store them as lower case.

clvrobj commented 6 years ago

It makes sense. I've merged this and deployed v2.12.2 to the chrome web store. Let's see the performance. Feel free to submit further pull request. Thanks @ventz !

ventz commented 6 years ago

@clvrobj That's great - thank you!