HTTPArchive / legacy.httparchive.org

<<THIS REPOSITORY IS DEPRECATED>> The HTTP Archive provides information about website performance such as # of HTTP requests, use of gzip, and amount of JavaScript. This information is recorded over time revealing trends in how the Internet is performing. Built using Open Source software, the code and data are available to everyone allowing researchers large and small to work from a common base.
https://legacy.httparchive.org
Other
328 stars 84 forks source link

Detect third party JS libraries with custom metric #77

Closed rviscomi closed 7 years ago

rviscomi commented 7 years ago

Using custom metric scripts, detect the presence of third party libraries and their version if available.

For example, a page may have "jquery@3.2.0,modernizr,backbone@1.3.3,...".

rviscomi commented 7 years ago

Previous discussion: https://discuss.httparchive.org/t/tracking-javascript-library-versions-in-http-archive/55/24

tlauinger commented 7 years ago

I'm a co-author of the paper referenced in #78. Our library detection code is a fork of the library detection code used by the Library Detector for Chrome extension. We did a few modifications to the code more than a year ago, so our fork isn't really up to date any more. I compared our code to the current version today and found the following differences:

Libraries detected by the Chrome extension but not by our code: GWT, Ink, Vaadin, Zurb, Polymer, Highcharts, InfoVis, Blackbird, CreateJS, Google Maps, Spry, Qooxdoo, Ext JS, base2, closure, Processing.js, Mapbox, Sammy, Rico, MochiKit, gRaphaël (fix gRapha&euml;l), Glow, FuseJS, Tween.js, SproutCore, Zepto.js, PhiloGL, LABjs, Head JS, ControlJS, RightJS, Pusher, Swiffy, Move, AmplifyJS, Popcorn.js, Spine, Visibility.js, IfVisible.js, DC.js, Vue, Two, Brewser, Material Design Lite, Kendo UI, Matter.js, Riot, Sea.js, ScrollMagic

Libraries not currently detected by the Chrome extension (added by us): swfobject, flexslider, moment-timezone (plug-in), json3, pep, spf, numeral

Equivalent detection code in both: Lo-Dash, Underscore (test checks that no Lo-Dash detection in window), yepnope, jQuery Tools, D3, Ember.js, Greensock JS, Isotope

Detection code better in the Chrome extension (better error handling etc.): Bootstrap, three.js (better handling of two cases), CamanJS, WebFont Loader (set version to null)

Detection code in the Chrome extension that is more restrictive than ours (fewer false positives when the version property is not extracted, but likely lower coverage of older library versions/when the additional method used in the signature isn't part of the API): FlotCharts, jQuery UI, Dojo, Prototype, Scriptaculous, MooTools, YUI 2/YUI 3, Raphaël (possibly fix &euml; in name), React, Modernizr, Backbone, Mustache, Fabric.js, Paper.js, Handlebars, Knockout, jQuery Mobile, Angular, Hammer.js, Velocity.js, FastClick, Marionette, Can

Detection code improved in our fork (usually added support for older versions): Leaflet (also allows win.L.VERSION, but additionally use extension restrictions on GeoJSON etc.), Socket.IO (added alternative win.io.Socket), RequireJS (seems to cover more versions), Pixi.js (better error checking), Moment.js (but add win.moment.isMoment check from extension)

How would you like to proceed to add all those libraries? We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

igrigorik commented 7 years ago

Tobias, thanks for digging into this!

We could try to have our few extra library signatures added to Library Detector; then you could simply integrate their entire file with all the library detection tests.

Big +1 to this. I'd love to avoid trying to replicate efforts and it looks like library detector is a fairly active project, so everyone would benefit if we converge on improving library detector core and reusing it across projects.

tlauinger commented 7 years ago

Sounds good! See johnmichel/Library-Detector-for-Chrome#89

igrigorik commented 7 years ago

@tlauinger awesome, thanks!

tlauinger commented 7 years ago

johnmichel/Library-Detector-for-Chrome#91 has been merged; the Library Detector code now has all our additional and updated library tests (minus two that weren't that useful after all).

A few more thoughts on things that could make the data easier to work with:

rviscomi commented 7 years ago

If you'd like, I could write up these points in a brief library detection README.

Yes please! Let's make a docs/custom-metrics.md doc. See https://github.com/HTTPArchive/httparchive/pull/82 for example.

igrigorik commented 7 years ago

So it could be a good idea to keep a document somewhere that lists the dates when support for a library was added

Let's log the version of Library-Detector-for-Chrome in our traces. This way we can go back to the library commit logs and see what was supported vs not... Ideally, LDfC would have a release log that we can point to; otherwise we're duplicating their work. /cc @johnmichel

rviscomi commented 7 years ago

Another thing to note is that LDfC's detection script has two kinds of output for each library: an object containing version info, or false if the version can't be detected. If a library is detected but not its version, it's the same as no library at all. In other words, it doesn't support the lib@null use case in #80.

I opened https://github.com/johnmichel/Library-Detector-for-Chrome/issues/92 to explore the feasibility of adding support.

rviscomi commented 7 years ago

The actual integration of LDfC is something else I'd like to get feedback on. One approach would be to run a script that:

The detection object's variable name also has some kind of unique prefix; d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests. If that happens to be a good version string, we could extract it and save it somewhere.

igrigorik commented 7 years ago

@johnmichel how is the var name generated? Could we define a cleaner interface for non-extension consumers of libraries.js?

test each library, using the variable name extracted above

@rviscomi not sure I understand what this step does?

rviscomi commented 7 years ago

Roughly

Object.entries(d41d8cd98f00b204e9800998ecf8427e_LibraryDetectorTests).map((name, lib) => lib.test(window))...

Written on mobile, not tested. Just an example of accessing the objects props and testing each library.

igrigorik commented 7 years ago

Oh, doh.. I see, that makes sense.

johnmichel commented 7 years ago

@igrigorik What @rviscomi said is basically the gist of it. That unique prefix was in place before I assumed control of the project, so if it's not ideal or usable enough, I'm certainly open to ideas for a cleaner or more straightforward approach.

igrigorik commented 7 years ago

@johnmichel by the looks of it, that script gets injected into the page, so I'm assuming the idea behind the unique fingerprint is to avoid collisions with other content.. Does the prefix change from release to release?

@rviscomi re, version: I guess we can pull it out from manifest.json and record it in our script?

johnmichel commented 7 years ago

@igrigorik The fingerprint doesn't change from release to release, so you should be able to rely on it being consistent. If it would make sense to rotate it for versioning on the httparchive side, that is also an option that could be explored as it doesn't affect anything outside of the extension.

igrigorik commented 7 years ago

@johnmichel nah, that's fine.. we're better off with the version in the manifest.

igrigorik commented 7 years ago

Now that #85 landed, can we close this? Anything left?

rviscomi commented 7 years ago

@igrigorik the final piece would be unknown version support, but that's being tracked in https://github.com/johnmichel/Library-Detector-for-Chrome/issues/92 so this one can be closed with that caveat

igrigorik commented 7 years ago

Got it, thanks. We should figure out who's taking the lead on that one.. Looks like there is agreement on how we want to tackle it, but not clear who's actually doing it :)

rviscomi commented 7 years ago

You're right. I'll take it.

rviscomi commented 7 years ago

@tlauinger @johnmichel FYI this is now live. See https://discuss.httparchive.org/t/javascript-library-detection/955. Thanks for your help making this happen!

johnmichel commented 7 years ago

@rviscomi @tlauinger @igrigorik Happy to have helped 😄 Please let me know if future enhancements are desired!