digital-analytics-program / gov-wide-code

Provides a set of javascript files and documentation to implement web analytics on US federal websites
http://www.digital.gov/dap
104 stars 55 forks source link

DAP appears to assume all iframes are youtube embeds #18

Closed konklone closed 7 years ago

konklone commented 9 years ago

The DAP code appears to identify all <iframe> tags as YouTube embeds, and rewrites their URLs from http:// to https://. This could break <iframe> tags on a page that point to things other than YouTube embeds, and for which HTTPS is not supported.

/*
 * name: YTUrlHandler_fed
 * usage: to correct minor errors in YouTube URLs
 */
function YTUrlHandler_fed(url)
        {
        url = url.replace(/origin\=(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})\&?/ig,'origin='+document.location.protocol+'//'+document.location.host);

        stAdd = '';
        adFlag = false;
        if (url.indexOf('https')==-1){url = url.replace('http','https');}
        if (url.indexOf('?')==-1){stAdd = '?flag=1';}
        if (url.indexOf('enablejsapi')==-1){stAdd +='&enablejsapi=1'; adFlag = true;}
        if (url.indexOf('html5')==-1){stAdd +='&html5=1'; adFlag = true;}
        if (url.indexOf('origin')==-1){stAdd +='&origin='+document.location.protocol+'//'+document.location.host;adFlag = true;}

/*
 * name: _initYouTubeTracker
 * usage: initiate YouTube tracker libraries and loop over all YouTube iframes
 */

function _initYouTubeTracker() {
    var _iframes = document.getElementsByTagName('iframe');
    var vArray = 0;
    for (var ytifrm = 0; ytifrm < _iframes.length; ytifrm++) {
        _thisVideoObj = _iframes[ytifrm];
        var _thisSrc = _thisVideoObj.src;
        if (IsYouTube_fed(_thisSrc)) {
            _thisVideoObj.src = YTUrlHandler_fed(_thisSrc);
            var youtubeid = youtube_parser_fed(_thisSrc);
            _thisVideoObj.setAttribute('id', youtubeid);
            videoArray_fed[vArray] = youtubeid;
            vArray++;
        }
    }
}
konklone commented 9 years ago

So, I note that I didn't look at all the appropriate code to gauge this without empirical testing. The original (1.04) code does try to determine whether the iframe is a YouTube URL:

function IsYouTube_fed(url) {
    var YouTubeLink_regEx = /^.*((youtu.be\/)|(\/v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
    var match = url.match(YouTubeLink_regEx);
    if (match != null && match.length > 0) {
        return true;
    } else {
        return false;
    }
}

The new code (2.0) has changed how this works:

function IsYouTube_fed(url) {
  var YouTubeLink_regEx = /^(https?\:)?(\/\/)?(www\.)?(youtu\.be\/|youtube(\-nocookie)?\.([A-Za-z]{2,4}|[A-Za-z]{2,3}\.[A-Za-z]{2})\/)(watch|embed\/|vi?\/)?(\?vi?\=)?([^#\&\?\/]{11}).*$/;
  if(YouTubeLink_regEx.test(url.toString())) {
    return true;
  }
  else {
    return false;
  }
}

I'm not sure from this change what the bug was in 1.04, and what was addressed for 2.0. I'd need to see some technical explanation. I did a quick test of an <iframe> tag to http://www.defense.gov under 1.04 and 2.0 and neither broke, so I'm unable to empirically reproduce or confirm the fix without more details.

tdlowden commented 8 years ago

I believe this is now remedied. We upgraded to the current YouTube API version in v3.1.