[ ] Refactor all the platform-specific request and response handling into a new module (e.g., WebScience.Utilities.SocialMediaEvents). We know that other studies will want to take advantage of that functionality. Something similar to the PageEvents module would be ideal (e.g., a listener function just gets clean data like a time stamp, the platform, the event, the event content, and URLs extracted from the event content).
[ ] Refactor the tokenization and deduplication functionality into a new module (e.g., WebScience.Utilities.TextParsing), since it's quite possible we'll want to use that functionality elsewhere.
[ ] Nit: the debugLog line should include the Studies namespace (see the other Studies modules).
[ ] Nit: use a more descriptive variable name for sdrs.
[ ] Several comments on the following line:
var sdrs = await browser.storage.local.get("shortDomainRegexString");
It appears to create an implicit dependency on a non-utility module, which we want to avoid. All the shortened URL functionality for this and other study modules (loading domains, generating the regular expression, checkShortUrl, etc.) should go in the LinkResolution utility module. And implicit dependencies are especially bad, because minor changes can break them and they can be difficult to debug.
The name for the value in local storage doesn't have a namespace (e.g., WebScience.Utilities.foo). We should stick to that convention for clarity and debugging.
We can store RegExp objects in extension local storage, so there's no need to recompile the object from the string on each extension load.
[ ] Nit: looks like the comma at the end of this line should be a semicolon.
shortUrlMatcher = new RegExp(sdrs.shortDomainRegexString),
[ ] Nit: clarify this comment: "Make this available to content scripts"
[ ] Nit: the function names facebookSharing, twitterSharing, and redditSharing could be a bit more descriptive, e.g., enableFacebookSharingMeasurement.
[ ] Nit: the module should be consistent in how the three supported platforms are ordered (in the runStudy documentation, in the runStudy implementation, and in the platform-specific functions).
[ ] We shouldn't have the blocking option set for any of the webRequest events unless necessary (e.g., where we're using a webRequest.StreamFilter to read the response or for the intervention studies we've discussed).
[ ] You can avoid some repetition in the code blocks for recording shared content by moving the share ID counter incrementation and debugging output into the createShareRecord function. Should probably rename that function at the same time, e.g., recordSharingEvent.
[ ] The social media sharing event object that's stored should be a defined type with JSDoc (e.g., SocialMediaSharingEvent).
[ ] Nit: rebalance indentation throughout the module.
Twitter
[ ] Nit: move the "Twitter events we handle" part of the twitterSharing function documentation to the top of the function documentation (like for Facebook). It's clearer to start with what we support, then explain implementation details and caveats.
[ ] Nit: it would be cleaner to organize the Twitter functionality into two blocks: Web Intent actions and ordinary Twitter API calls.
[ ] Looks like the Twitter Web Intent APIs can have URL parameters, which would necessitate a wildcard at the end of the blob matching string. We should either add support or document why those aren't necessary (e.g., a GET with parameters just opens a Twitter page, a POST without parameters is what triggers the actual event).
[ ] Looks like we could use webRequest.onBeforeRequest rather than webRequest.onHeadersReceived for the Twitter events. It appears OK to set up the webRequest.StreamFilter before the response starts.
[ ] Looks like the recordTwitterUrls function could be radically simplified and merged into the (revised) createShareRecord function.
[ ] The processTwitterResponse function should account for the expected type of response, rather than guessing at possible response types. In other words, the function should incorporate the type of Twitter event that it's parsing the response for, and that should be part of the logic for parsing the response body.
[ ] The output from processTwitterResponse should presumably match just one event type, plus a set of URLs. The current implementation, where the output is an object that could have multiple event types and URL sets, doesn't quite make sense. The only situation where I see some possible overlap is a tweet that quotes another tweet that contains a URL... and probably the better way to handle that is to have a main event type and content then optionally an attribute for a quoted tweet and associated content.
[ ] Nit: add more explanation to this comment: "Using the 'embed' link gets us the tweet content a lot more simply."
Facebook
[ ] Nit: add a comment explaining why the fallback Facebook HTTP(S) request happens in the content script rather than the background page. Presumably because there's shared page parsing logic, and that approach also makes sure the user's still logged in.
[ ] Add error handling (or at minimum, documentation about what will happen with errors) for the browser.tabs.sendMessage call in the Facebook logic (e.g., if there's no response or the page has closed).
[ ] Document what the https://www.facebook.com/webgraphql/mutation/ endpoint does (to the best of your knowledge). Is this only for posting status updates, or does Facebook use the endpoint for other events (as the name suggests)? If the latter, we should have logic to confirm it's a status update event.
[ ] Nit: the https://www.facebook.com/ match pattern is redundant; it's included in the https://www.facebook.com/* match pattern.
[ ] Running the Facebook content scripts at document_idle should be fine, since the page has to be loaded for the user to interact with it.
[ ] Add more documentation to the requestPostContents function in the content script, including what the API endpoint is and what the text replacement is doing.
[ ] The following line in the content script looks very fragile (and is also missing a semicolon). Please try to find a more reliable approach, and if you can't, clearly document why not.
var mediaBoxes = node.querySelectorAll("a[class=_52c6]")
[ ] The content script has an implicit dependency on utils.js. You should just include the relevant code in this content script. Ordinarily we would library-ify the code, but this is a small snippet, it's unlikely to change, and creating library content scripts is a pain.
[ ] Nit: rename socialMediaSharing.js to socialMediaSharing-facebook.js to clarify it's a Facebook-specific content script. Also update the documentation at the top of the file to explain what it does on Facebook pages and why.
[ ] For discussion: add support for Facebook comments and likes of Facebook comments, which presumably use different endpoints.
Reddit
[ ] Add documentation to the redditSharing function that explains what we do and don't support (like for the Twitter and Facebook functions).
[ ] For discussion: add support for Reddit comments, which might use a different endpoint.
[ ] For discussion: add support for Reddit post upvotes, comments, and comment upvotes.
General
WebScience.Utilities.SocialMediaEvents
). We know that other studies will want to take advantage of that functionality. Something similar to thePageEvents
module would be ideal (e.g., a listener function just gets clean data like a time stamp, the platform, the event, the event content, and URLs extracted from the event content).WebScience.Utilities.TextParsing
), since it's quite possible we'll want to use that functionality elsewhere.debugLog
line should include theStudies
namespace (see the otherStudies
modules).sdrs
.checkShortUrl
, etc.) should go in theLinkResolution
utility module. And implicit dependencies are especially bad, because minor changes can break them and they can be difficult to debug.WebScience.Utilities.foo
). We should stick to that convention for clarity and debugging.RegExp
objects in extension local storage, so there's no need to recompile the object from the string on each extension load.facebookSharing
,twitterSharing
, andredditSharing
could be a bit more descriptive, e.g.,enableFacebookSharingMeasurement
.runStudy
documentation, in therunStudy
implementation, and in the platform-specific functions).blocking
option set for any of thewebRequest
events unless necessary (e.g., where we're using awebRequest.StreamFilter
to read the response or for the intervention studies we've discussed).createShareRecord
function. Should probably rename that function at the same time, e.g.,recordSharingEvent
.SocialMediaSharingEvent
).Twitter
twitterSharing
function documentation to the top of the function documentation (like for Facebook). It's clearer to start with what we support, then explain implementation details and caveats.https://twitter.com/intent/retweet
Web Intent endpoint to support. If there is, we should add support. If that just redirects to another endpoint, we should document that. See: https://developer.twitter.com/en/docs/twitter-for-websites/web-intents/overviewwebRequest.onBeforeRequest
rather thanwebRequest.onHeadersReceived
for the Twitter events. It appears OK to set up thewebRequest.StreamFilter
before the response starts.recordTwitterUrls
function could be radically simplified and merged into the (revised)createShareRecord
function.processTwitterResponse
function should account for the expected type of response, rather than guessing at possible response types. In other words, the function should incorporate the type of Twitter event that it's parsing the response for, and that should be part of the logic for parsing the response body.processTwitterResponse
should presumably match just one event type, plus a set of URLs. The current implementation, where the output is an object that could have multiple event types and URL sets, doesn't quite make sense. The only situation where I see some possible overlap is a tweet that quotes another tweet that contains a URL... and probably the better way to handle that is to have a main event type and content then optionally an attribute for a quoted tweet and associated content.Facebook
browser.tabs.sendMessage
call in the Facebook logic (e.g., if there's no response or the page has closed).https://www.facebook.com/webgraphql/mutation/
endpoint does (to the best of your knowledge). Is this only for posting status updates, or does Facebook use the endpoint for other events (as the name suggests)? If the latter, we should have logic to confirm it's a status update event.https://www.facebook.com/
match pattern is redundant; it's included in thehttps://www.facebook.com/*
match pattern.document_idle
should be fine, since the page has to be loaded for the user to interact with it.requestPostContents
function in the content script, including what the API endpoint is and what the text replacement is doing.utils.js
. You should just include the relevant code in this content script. Ordinarily we would library-ify the code, but this is a small snippet, it's unlikely to change, and creating library content scripts is a pain.socialMediaSharing.js
tosocialMediaSharing-facebook.js
to clarify it's a Facebook-specific content script. Also update the documentation at the top of the file to explain what it does on Facebook pages and why.Reddit
redditSharing
function that explains what we do and don't support (like for the Twitter and Facebook functions).