Scheduled Reports vs. Real-time Reporting

angelinaeng commented 4 years ago

Can you confirm and/or clarify when both "publisher" and "advertiser" would receive the event and/or aggregate data? Are there mechanisms to provide data to their reporting systems in real-time or near-to-real-time?

Both buyside and sellside platforms use real-time data (which includes, impressions, clicks and conversion data) for many reasons:

QA that conversion tags are firing correctly, and that the proper data or parameters being passed through are working correctly, and registered counts are appearing in the appropriate platform
buyers look at reporting multiple times throughout the day to optimize campaigns, and make adjustments manually to settings, bids, targeting and such. So if things aren't performing well, we need to make decisions
Pacing Reports. Many companies need to monitor the pacing of the media campaigns to see if the campaign is delivering at it's intended pace. If things are either under-delivering or over-delivering, decisions will be made to either the bid or the targeting parameters. Buyers monitor this daily, weekly and monthly, and sometimes even hourly. I've had several campaigns where publishers said that the campaign is live and running. But when we pull reports, sometimes we don't see any or low data...which means that either something is wrong technically/logistically, or that the publishers is not telling the truth.
Budget Management / Reconciliation. With respect to pacing reports, there are implications to financials. if there are delays in getting the event and conversion data, and the browser can't send the data at its intended "scheduled" time, then the publisher loses out on revenue. Buyers make budget shifts constantly, and needs to know at any given time how much has been spent. Contracted agreements and payment terms require that all delivery for a given month should be reported as accurately as possible. Publishers send out invoices typically on the 1st of the month for the previous month's activity. And buyers are required to pay that invoice within 30-60 days. If delivery information is not fully reported, and data rolls in after - how do we reconcile the spend across all parties?

these are 4 use cases for the need for real-time reporting. There are more, but these are the priority issues if there is no real-time data.

fyi - i brought this up at the end of the 4/9 call.

csharrison commented 4 years ago

Hey Angelina, Right now the API is designed around a reporting endpoint that the publisher / advertiser specify. Reports go to that endpoint with substantial delay, to preserve user privacy. Real time reports are not supported currently, though with the aggregate API we could likely tolerate a shorter delay.

Thanks for the list of use-cases, that's really helpful. I have some comments and questions about them:

QAing that conversion tags are still firing can continue as normal. All tags (impressions and conversions) will still fire and you will be able to see what parameters are being passed in. The only thing that is being delayed is the attributed data.

2 / 3: I don't fully understand the distinction between these two. It seems like they are both generally about checking in on the performance of a campaign and making decisions based on the data. Delays in this case cause delays in decision making (i.e. you can't be as agile) or in noticing a problem. Is that correct? How long does it typically take for changes in a campaign to reflect in attributed conversion data? i.e. how long is the current edit / analyze loop in the typical case?

I will think about this more but I wonder if the aggregate API could help here. Could you tolerate delays on the order of ~1 hour for this use case? I could see generally solving this by tolerating delays in billing (i.e. bill at the Nth day of the month if we have max N-day delays), though I understand that is probably a difficult business change.

angelinaeng commented 4 years ago

Hi Charlie. I hope all is well.

Often times when we implement new "tags" for publishers / platforms on an advertisers site (eg. adding tags into a tag manager) before a campaign starts, we go through a process with the web team and the media activation team to QA if the tags are placed correctly on the site in both the developer / staging site, and then the live public site. We QA for things such as the tags are:
- the right tag(s) is on the right page(s). Often we send over many tags to place on a site, so we need to ensure if they are implemented correctly. Sometimes the web team places the wrong conversion tags on pages.
- if there's any sort of data that needs to pass through into any parameters, such as "product name, product id / sku, product type", etc. , and to confirm they are passing through the right variable parameters. In Campaign Manager, they are referred to as u1, u2, u3.
- When we QA, need to ensure that the ad servers and ad platforms are registering these conversions in their system. We typically try to implement these tags a week or so in advance of a campaign going live. And therefore, we often test tags when there is no attribution associated. Usually we find that the advertiser's tech team doesn't implement these tags properly, and so there is often back and forth with them to troubleshoot these issues. Sometimes this can go unresolved for days. So if there are any delays in seeing if any of these platforms or systems from seeing any conversion activity occurring (unattributed and attributed), it will be quite difficult to confirm if the conversion tags are implemented correctly in a timely manner.

As far as #2 and #3, yes very similar. However, main difference is that:

2 is about optimizing a campaign on what is performing best and what is not, and enabling buyers to make decisions on what is performing well.
3 is primarily on budget management. Ensuring that campaigns are spending budgets evenly and/or appropriately.

In both scenarios, buyers typically want to see the results of their optimizations every few hours. Advertisers can get very antsy when it comes to their money, so many times when they launch something new (like a new creative, or campaign, or media placement) or have very large budgets, they immediately want to know if there are indications that their campaign is performing well, or that the optimizations have been activated.

In programmatic and social, media teams look at the data at least 3-4 times a day, and make adjustments whenever it is needed throughout the course of the campaign.

And in the scenario where somethings goes wrong or doesn't appear to be right, we often (try to) find out early enough because most agencies are looking at the data frequently. Example, many agencies create a "Day of Launch" report, whenever a new campaign starts. We pull reports and screenshots to share with the client for the first 5 days of a campaign or new creative - which sites are running, which creatives are running and how it's performing (impressions delivered, click rates, conversion rates, and cost pers- associated). If in that timeframe, we see that:

no conversions are registering...then we know to look at the creative and destination URL. most likely an error with the landing page url
there is low impression volume from a publisher....then we know to contact the publisher and ask him why our campaign is not running (when they said it was). If a programmatic PMP, could be an issue with the Deal ID, targeting segments, bid cap, etc.
if a publisher/platform is running too many ads too quickly. Could this be due to a setting issue. Will publisher bonus client more impressions?

Most buyers rely on the platform's dashboards to provide this information. So example, Facebook, The Trade Desk, DV360, etc.

For #4, with the aggregated API ~1 hour works so long as it captures a high accurate count of of impressions, clicks and conversion data, for publishers, buyers, and the in-betweens (DMP, DSPs, SSPs, Ad Exchanges, Ad Verfication, etc.). We all get paid by the advertiser, so all the systems in the eco-system need to know the accurate count for financial revenue and billing reasons. We need to ensure that everyone in the digital eco-chain know what their finances are at any given time.

johnsabella commented 4 years ago

Thanks Angelina, Thanks Charlie On #4 there are very large, scaled systems that rely on this data. A 1-hour delay would simply not work, these systems require a near real-time stream of feedback to asses and adjust their algorithms for pacing, budget completion, bid pricing, to name a few.

pinaik-msft commented 4 years ago

Agree with the concerns outlined above. Additionally, (aside from NRT reporting) want to also call out that having the exact conversion timestamp on the conversion is necessary for several ad platform use cases such as conversion modeling and conversion based automated bidding strategies.

csharrison commented 4 years ago

@pinaik-msft can you explain in a bit more detail why you need the exact timestamp for modeling / bidding? What granularity do you need the timestamp to be?

pinaik-msft commented 4 years ago

@csharrison - at least at a date grain for various conversion related analytics and modeling but ideally at date + time since there are use cases for that as well. One example is for calculating time-to-convert (i.e. time taken between click and a conversion). This particular metric is also used for advertiser reporting.

csharrison commented 4 years ago

Thanks, that makes sense. For the event-level API such granularity is difficult to achieve to satisfy our privacy goals but we hope we can get finer grained time data in the aggregate version of the API which may be OK for the reporting use-case (obviously for bidding and modeling using aggregate data is a challenge).

WICG / attribution-reporting-api

Scheduled Reports vs. Real-time Reporting #39

2 is about optimizing a campaign on what is performing best and what is not, and enabling buyers to make decisions on what is performing well.

3 is primarily on budget management. Ensuring that campaigns are spending budgets evenly and/or appropriately.