chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
586 stars 845 forks source link

Possible bug in `/:owner/:repo/linking_websites` #151

Closed OrkoHunter closed 6 years ago

OrkoHunter commented 6 years ago

The API endpoint /:owner/:repo/linking_websites works just fine, but seem to return something unexpected.

Example http://twitter.augurlabs.io/api/unstable/twitter/finagle/linking_websites returns

[{"url":"<!DOCTYPE html>","rank":null},{"url":"<!--[if lt IE 7]>      <html lang=\"en\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]-->","rank":null},{"url":"<!--[if IE 7]>         <html lang=\"en\" class=\"no-js lt-ie9 lt-ie8\"> <![endif]-->","rank":null},{"url":"<!--[if IE 8]>         <html lang=\"en\" class=\"no-js lt-ie9\"> <![endif]-->","rank":null},{"url":"<!--[if gt IE 8]><!--> <html lang=\"en\" class=\"no-js\"> <!--<![endif]-->","rank":null},{"url":"<head>","rank":null},{"url":"<meta charset=\"utf-8\">","rank":null},{"url":"<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">","rank":null},{"url":"<title>PublicWWW - PublicWWW.com<\/title>","rank":null},{"url":"<meta name=\"description\" content=\"Search engine for source code - ultimate solution for digital marketing and affiliate marketing research.\">","rank":null},{"url":"<meta name=\"viewport\" content=\"width=device-width\">","rank":null},{"url":"<meta name=\"referrer\" content=\"never\">","rank":null},{"url":"<link rel=\"apple-touch-icon\" sizes=\"180x180\" href=\"\/images\/favicon7\/apple-touch-icon.png\">","rank":null},{"url":"<link rel=\"icon\" type=\"image\/png\" href=\"\/images\/favicon7\/favicon-32x32.png\" sizes=\"32x32\">","rank":null},{"url":"<link rel=\"icon\" type=\"image\/png\" href=\"\/images\/favicon7\/favicon-16x16.png\" sizes=\"16x16\">","rank":null},{"url":"<link rel=\"manifest\" href=\"\/images\/favicon7\/manifest.json\">","rank":null},{"url":"<link rel=\"mask-icon\" href=\"\/images\/favicon7\/safari-pinned-tab.svg\" color=\"#5bbad5\">","rank":null},{"url":"<link rel=\"shortcut icon\" href=\"\/images\/favicon7\/favicon.ico\">","rank":null},{"url":"<meta name=\"msapplication-config\" content=\"\/images\/favicon7\/browserconfig.xml\">","rank":null},{"url":"<meta name=\"theme-color\" content=\"#ffffff\">","rank":null},{"url":"<link rel=\"search\" type=\"application\/opensearchdescription+xml\" title=\"PublicWWW\" href=\"\/images\/opensearch.xml\" \/>","rank":null},{"url":"<link rel=\"stylesheet\" href=\"\/images\/css\/bootstrap.min.css\">","rank":null},{"url":"<link rel=\"stylesheet\" href=\"\/images\/css\/style.7.css\">","rank":null},{"url":"<\/head>","rank":null},{"url":"<body>","rank":null},{"url":"\t<div class=\"mainmenu-wrapper\">","rank":null},{"url":"\t\t<div class=\"container\">","rank":null},{"url":"\t\t\t<nav id=\"mainmenu\" class=\"mainmenu\">","rank":null},{"url":"\t\t\t\t\t<ul class=\"pull-left\">","rank":null},{"url":"\t\t\t\t\t\t<li class=\"logo-wrapper\"><a href=\"\/\"><i class=\"glyphicon glyphicon-chevron-left\"><\/i><i class=\"glyphicon glyphicon-search\"><\/i><i class=\"glyphicon glyphicon-chevron-right\"><\/i><span class=\"hidden-xs\"> PublicWWW<\/span><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t\t<li ><a href=\"\/examples\/ads.html\"><span class=\"hidden-xs\">Examples<\/span><i class=\"visible-xs glyphicon glyphicon-question-sign\"><\/i><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t\t<li ><a href=\"\/pricing.html\"><span class=\"hidden-xs\">Pricing<\/span><i class=\"visible-xs glyphicon glyphicon-star\"><\/i><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t<\/ul>","rank":null},{"url":"\t\t\t\t\t<ul class=\"pull-right navbar-right\">","rank":null},{"url":"\t\t\t\t\t\t  \t\t\t\t\t\t\t<li ><a href=\"\/profile\/signup.html\" rel=\"nofollow\"><i class=\"glyphicon glyphicon-user\"><\/i><span class=\"hidden-xs\"> Sign Up<\/span><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t\t\t<li ><a href=\"\/profile\/login.html\" rel=\"nofollow\"><i class=\"glyphicon glyphicon-log-in\"><\/i><span class=\"hidden-xs\"> Log In<\/span><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t\t  \t\t\t\t\t<\/ul>","rank":null},{"url":"\t\t\t<\/nav>","rank":null},{"url":"\t\t<\/div>","rank":null},{"url":"\t<\/div>","rank":null},{"url":"<div id=\"wrap\">","rank":null},{"url":"  <div>","rank":null},{"url":"<center>","rank":null},{"url":"<div id=\"please1\" style=\"padding-top: 100px","rank":"><h1>Please enable JavaScript<\/h1><\/div>\n<p>\n    <a href=mailto:support@publicwww.com\">support@publicwww.com<\/a>"},{"url":"<\/p>","rank":null},{"url":"<\/center>","rank":null},{"url":"<script>","rank":null},{"url":"document.getElementById(\"please1\").innerHTML=\"PublicWWW service is currently under maintenance.\"","rank":"var f3=function(s){function L(k,d){return(k<<d)|(k>>>(32-d))}function K(G,k){var I,d,F,H,x"},{"url":"function f2(cname){var name=cname+\"=\"","rank":"var ca=document.cookie.split(\""},{"url":"return\"\"","rank":"}"},{"url":"while(true){var n=String(Math.random())","rank":"var r=f3(\"fbc75cbd862d39096069d0dc546c4da2\"+n)"},{"url":"break","rank":"}}"},{"url":"document.getElementById(\"please1\").innerHTML=\"processing request...\"","rank":"document.location.reload(true)"},{"url":" <\/div>","rank":null},{"url":"<\/div>","rank":null},{"url":"<!-- Footer -->","rank":null},{"url":"\t    <div class=\"footer\">","rank":null},{"url":"\t    \t<div class=\"container\">","rank":null},{"url":"\t\t    \t<div class=\"row hidden-xs\">","rank":null},{"url":"                    <div class=\"col-footer col-md-5 col-xs-6\">","rank":null},{"url":"\t\t    \t\t\t<h3 style=\"white-space:nowrap","rank":">Usage Examples<\/h3>\n\t\t\t\t\t\t\t<div class=row\">"},{"url":"\t\t\t                    <div class=\"col-md-6 col-xs-12\">","rank":null},{"url":"\t\t\t\t\t\t\t\t\t<ul class=\"no-list-style footer-navigate-section\">","rank":null},{"url":"\t\t\t\t\t\t\t\t\t\t<li><a href=\"\/examples\/ads.html\" style=\"white-space:nowrap","rank":">Advertising Networks<\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t<li><a href=\/examples\/marketing.html\" style=\"white-space:nowrap"},{"url":"\t\t\t\t\t\t\t\t\t<ul class=\"no-list-style footer-navigate-section\">","rank":null},{"url":"\t\t\t\t\t\t\t\t\t\t<li><a href=\"\/examples\/cms.html\" style=\"white-space:nowrap","rank":">Content Management Systems<\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t<li><a href=\/popular\/js\/index.html\" style=\"white-space:nowrap"},{"url":"\t\t\t\t\t\t\t\t\t\t<li style=\"white-space:nowrap","rank":"><a href=\/popular\/jsfiles\/index.html\">JavaScript Files<\/a><\/li>"},{"url":"\t\t\t\t\t\t\t\t\t\t<li style=\"white-space:nowrap","rank":"><a href=\/popular\/cssfiles\/index.html\">CSS Files<\/a><\/li>"},{"url":"\t\t\t\t\t\t\t\t\t<\/ul>","rank":null},{"url":"\t\t\t\t\t\t\t\t<\/div>","rank":null},{"url":"\t\t\t\t\t\t\t<\/div>","rank":null},{"url":"\t\t    \t\t<\/div>","rank":null},{"url":"\t\t\t\t\t<div class=\"col-footer col-md-2 col-xs-0\">","rank":null},{"url":"\t\t\t\t\t<\/div>","rank":null},{"url":"\t\t    \t\t<div class=\"col-footer col-md-5 col-xs-6\">","rank":null},{"url":"\t\t    \t\t\t<h3>PublicWWW<\/h3>","rank":null},{"url":"\t\t\t\t\t\t\t<div class=\"row\">","rank":null},{"url":"\t\t\t                    <div class=\"col-md-6 col-xs-12\">","rank":null},{"url":"\t\t\t\t\t    \t\t\t<ul class=\"no-list-style footer-navigate-section\">","rank":null},{"url":"                                                <li><a href=\"\/terms.html\" style=\"white-space:nowrap","rank":">Terms &amp; Conditions<\/a><\/li>\n                                                <li><a href=\/pricing.html\" style=\"white-space:nowrap"},{"url":"                                                <li>","rank":null},{"url":"                                                  <a href=\"mailto:support@publicwww.com\" style=\"white-space:nowrap","rank":">support@publicwww.com<\/a>\n                                                <\/li>\n\t\t\t\t\t\t\t\t<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t                    <div class=col-md-6 col-xs-12\">"},{"url":"\t\t\t\t\t    \t\t\t<ul class=\"footer-stay-connected no-list-style\">","rank":null},{"url":"                                                                                        <li><a class=\"twitter\" href=\"https:\/\/twitter.com\/publicww\" target=\"_blank\"><\/a><\/li>","rank":null},{"url":"                                                                                                                                    <li><a class=\"facebook\" href=\"https:\/\/www.facebook.com\/publicwww\/\" target=\"_blank\"><\/a><\/li>","rank":null},{"url":"                                                                                        <li><a class=\"googleplus\" href=\"https:\/\/plus.google.com\/communities\/108388693288736832045\" target=\"_blank\"><\/a><\/li>","rank":null},{"url":"\t\t\t\t\t    \t\t\t<\/ul>","rank":null},{"url":"\t\t\t\t\t\t\t\t<\/div>","rank":null},{"url":"\t\t\t\t\t\t\t<\/div>","rank":null},{"url":"\t\t\t    \t<\/div>                                \t\t    \t\t","rank":null},{"url":"\t\t    \t<\/div>","rank":null},{"url":"\t\t    \t<div class=\"row\">","rank":null},{"url":"\t\t    \t\t<div class=\"col-md-12\">","rank":null},{"url":"\t\t    \t\t\t<div class=\"footer-copyright\">&copy","rank":" 2005-2018 "},{"url":" <a href=\"https:\/\/publicwww.com\/\">PublicWWW<\/a>.","rank":null},{"url":" All rights reserved.<\/div>","rank":null},{"url":"\t\t    \t\t<\/div>","rank":null},{"url":"\t\t    \t<\/div>","rank":null},{"url":"\t\t    <\/div>","rank":null},{"url":"\t    <\/div>","rank":null},{"url":"<!-- Piwik -->","rank":null},{"url":"<script type=\"text\/javascript\">","rank":null},{"url":"  var _paq = _paq || []","rank":null},{"url":"  _paq.push(['trackPageView'])","rank":null},{"url":"  _paq.push(['enableLinkTracking'])","rank":null},{"url":"  (function() {","rank":null},{"url":"    var u=((\"https:\" == document.location.protocol) ? \"https\" : \"http\") + \":\/\/seomon.com\/piwik\/\"","rank":null},{"url":"    _paq.push(['setTrackerUrl', u+'piwik.php'])","rank":null},{"url":"    _paq.push(['setSiteId', 2])","rank":null},{"url":"    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]","rank":" g.type='text\/javascript'"},{"url":"    g.defer=true","rank":" g.async=true"},{"url":"  })()","rank":null},{"url":"<\/script>","rank":null},{"url":"<noscript><p><img src=\"https:\/\/seomon.com\/piwik\/piwik.php?idsite=2\" style=\"border:0","rank":" alt=\" \/><\/p><\/noscript>\n\n<script>\n  var google_conversion_paramone = 247;\n  var google_conversion_paramtwo = 195;\n  var google_conversion_startTime = new Date();\n  \n<\/script>\n<script async src=\/images\/js\/sockets.io.3.js\"><\/script>"},{"url":"<!-- End Piwik Code -->","rank":null},{"url":"    <\/body>","rank":null},{"url":"<\/html>","rank":null}]

The url parameter is returning HTML content, but I believe it should return a URL. Please correct me if I am wrong.

ccarterlandis commented 6 years ago

Hi @OrkoHunter, as far as I am aware, this is due to the fact that https://publicwww.com/, which is where we were getting this data from, changed the structure of their website. We were scraping their website manually to find this data, but when they changed the structure, this, unfortunately, meant our scraper did not work anymore (@howderek might know more about why this is happening, I might be incorrect).

I personally haven't spent any time looking into this; however, if you are wanting to use this endpoint, I would be more than happy to take a look at the endpoint and see what I can do. Thanks for pointing out all the bugs in our API, I'm glad to know someone is keeping us accountable!!

OrkoHunter commented 6 years ago

Thank you @ccarterlandis for the response! My internship ends in 2 days, and I am trying to implement as many Augur metrics as I can in the TwitterOSS metrics. :)

ccarterlandis commented 6 years ago

Of course! I want to make sure that you can get as much done as possible.

ccarterlandis commented 6 years ago

It looks as if publicwww has deprecated their API. As such, we are currently unable to implement this metric and had to deprecate the endpoint, as it was based on the data they provided. Closed, but will continue to search for alternative data sources for this metric.