edsu / earls

display urls being tweeted with an event hashtag
MIT License
18 stars 3 forks source link

am I wrong or earls manage same URL with "UTM something" like different URL? #21

Open remagio opened 9 years ago

remagio commented 9 years ago

Analyzing some twarc files I found by example:

    506 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_source=mbtwitter
     10 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_source=twitterfeed&utm_medium=twitter
      6 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_source=mbfb
      4 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=buffer6f025&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      2 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=buffer7cac8&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_source=Media+Brigade&utm_medium=Media+Brigade&utm_campaign=Media+Brigade
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_source=dlvr.it&utm_medium=twitter
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_medium=twitter
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=bufferf31b4&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=bufferc85f1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=buffer1d58d&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=buffer10315&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
      1 http://motherboard.vice.com/read/spy-tech-company-hacking-team-gets-hacked?utm_content=17755667&utm_medium=social&utm_source=twitter

And earls list them like different URL, reducing value&benefict of counting the url popularity. A secondary effect, it generate a very big URL list. Specially about trending things.