applied-ml
Curated papers, articles, and blogs on data science & machine learning in production. ⚙️
Figuring out how to implement your ML project? Learn how other organizations did it:
- How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
- Why it works, the science behind it with research, literature, and references 📂
- What real-world results were achieved (so you can better assess ROI ⏰💰📈)
P.S., Want a summary of ML advancements? 👉ml-surveys
P.P.S, Looking for guides and interviews on applying ML? 👉applyingML
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Privacy-Preserving Machine Learning
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
Data Quality
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb
2016
- Monitoring Data Quality at Scale with Statistical Modeling
Uber
2017
- Data Management Challenges in Production Machine Learning (Paper)
Google
2017
- Automating Large-Scale Data Quality Verification (Paper)
Amazon
2018
- Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek
2019
- Data Validation for Machine Learning (Paper)
Google
2019
- An Approach to Data Quality for Netflix Personalization Systems
Netflix
2020
- Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook
2020
Data Engineering
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
2018
- Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb
2020
- Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix
2020
- How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand
DoorDash
2020
- Revolutionizing Money Movements at Scale with Strong Data Consistency
Uber
2020
- Zipline - A Declarative Feature Engineering Framework
Airbnb
2020
- Automating Data Protection at Scale, Part 1 (Part 2)
Airbnb
2021
- Real-time Data Infrastructure at Uber
Uber
2021
- Introducing Fabricator: A Declarative Feature Engineering Framework
DoorDash
2022
- Functions & DAGs: introducing Hamilton, a microframework for dataframe generation
Stitch Fix
2021
- Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings
Pinterest
2022
- Lessons Learned From Running Apache Airflow at Scale
Shopify
2022
- Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta
2022
- Data Mesh — A Data Movement and Processing Platform @ Netflix
Netflix
2022
- Building Scalable Real Time Event Processing with Kafka and Flink
DoorDash
2022
Data Discovery
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code)
Apache
- Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code)
WeWork
- Discovery and Consumption of Analytics Data at Twitter
Twitter
2016
- Democratizing Data at Airbnb
Airbnb
2017
- Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber
2018
- Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code)
Netflix
2018
- Amundsen — Lyft’s Data Discovery & Metadata Engine
Lyft
2019
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft
2019
- DataHub: A Generalized Metadata Search & Discovery Tool (Code)
LinkedIn
2019
- Amundsen: One Year Later
Lyft
2020
- Using Amundsen to Support User Privacy via Metadata Collection at Square
Square
2020
- Turning Metadata Into Insights with Databook
Uber
2020
- DataHub: Popular Metadata Architectures Explained
LinkedIn
2020
- How We Improved Data Discovery for Data Scientists at Spotify
Spotify
2020
- How We’re Solving Data Discovery Challenges at Shopify
Shopify
2020
- Nemo: Data discovery at Facebook
Facebook
2020
- Exploring Data @ Netflix (Code)
Netflix
2021
Feature Stores
- Distributed Time Travel for Feature Generation
Netflix
2016
- Building the Activity Graph, Part 2 (Feature Storage Section)
LinkedIn
2017
- Fact Store at Scale for Netflix Recommendations
Netflix
2018
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
2018
- Feature Store: The missing data layer for Machine Learning pipelines?
Hopsworks
2018
- Introducing Feast: An Open Source Feature Store for Machine Learning (Code)
Gojek
2019
- Michelangelo Palette: A Feature Engineering Platform at Uber
Uber
2019
- The Architecture That Powers Twitter's Feature Store
Twitter
2019
- Accelerating Machine Learning with the Feature Store Service
Condé Nast
2019
- Feast: Bridging ML Models and Data
Gojek
2020
- Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression
DoorDash
2020
- Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed
LinkedIn
2020
- Building a Feature Store
Monzo Bank
2020
- Butterfree: A Spark-based Framework for Feature Store Building (Code)
QuintoAndar
2020
- Building Riviera: A Declarative Real-Time Feature Engineering Framework
DoorDash
2021
- Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory
Uber
2021
- ML Feature Serving Infrastructure at Lyft
Lyft
2021
- Near real-time features for near real-time personalization
LinkedIn
2022
- Building the Model Behind DoorDash’s Expansive Merchant Selection
DoorDash
2022
- Open sourcing Feathr – LinkedIn’s feature store for productive machine learning
LinkedIn
2022
- Evolution of ML Fact Store
Netflix
2022
- Developing scalable feature engineering DAGs
Metaflow + Hamilton
via Outerbounds
2022
- Feature Store Design at Constructor
Constructor.io
2023
Classification
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google
2010
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn
2011
- Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
Walmart
2014
- Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER
2016
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google
2017
- Discovering and Classifying In-app Message Intent at Airbnb
Airbnb
2019
- Teaching Machines to Triage Firefox Bugs
Mozilla
2019
- Categorizing Products at Scale
Shopify
2020
- How We Built the Good First Issues Feature
GitHub
2020
- Testing Firefox More Efficiently with Machine Learning
Mozilla
2020
- Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft
2020
- Scalable Data Classification for Security and Privacy (Paper)
Facebook
2020
- Uncovering Online Delivery Menu Best Practices with Machine Learning
DoorDash
2020
- Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
DoorDash
2020
- Deep Learning: Product Categorization and Shelving
Walmart
2021
- Large-scale Item Categorization for e-Commerce (Paper)
DianPing
, eBay
2012
- Semantic Label Representation with an Application on Multimodal Product Categorization
Walmart
2022
- Building Airbnb Categories with ML and Human-in-the-Loop
Airbnb
2022
Regression
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb
2017
- Using Machine Learning to Predict the Value of Ad Requests
Twitter
2020
- Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
Netflix
2020
- Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment
DoorDash
2020
Forecasting
- Engineering Extreme Event Forecasting at Uber with RNN
Uber
2017
- Forecasting at Uber: An Introduction
Uber
2018
- Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber
2018
- Under the Hood of Gojek’s Automated Forecasting Tool
Gojek
2019
- BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video)
Google
2020
- Retraining Machine Learning Models in the Wake of COVID-19
DoorDash
2020
- Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code)
Atlassian
2020
- Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code)
Uber
2021
- Managing Supply and Demand Balance Through Machine Learning
DoorDash
2021
- Greykite: A flexible, intuitive, and fast forecasting library
LinkedIn
2021
- The history of Amazon’s forecasting algorithm
Amazon
2021
- DeepETA: How Uber Predicts Arrival Times Using Deep Learning
Uber
2022
- Forecasting Grubhub Order Volume At Scale
Grubhub
2022
- Causal Forecasting at Lyft (Part 1)
Lyft
2022
Recommendation
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon
2003
- Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix
2012
- How Music Recommendation Works — And Doesn’t Work
Spotify
2012
- Learning to Rank Recommendations with the k -Order Statistic Loss (Paper)
Google
2013
- Recommending Music on Spotify with Deep Learning
Spotify
2014
- Learning a Personalized Homepage
Netflix
2015
- The Netflix Recommender System: Algorithms, Business Value, and Innovation (Paper)
Netflix
2015
- Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica
2016
- Deep Neural Networks for YouTube Recommendations
YouTube
2016
- E-commerce in Your Inbox: Product Recommendations at Scale (Paper)
Yahoo
2016
- To Be Continued: Helping you find shows to continue watching on Netflix
Netflix
2016
- Personalized Recommendations in LinkedIn Learning
LinkedIn
2016
- Personalized Channel Recommendations in Slack
Slack
2016
- Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba
2017
- Artwork Personalization at Netflix
Netflix
2017
- A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter
2017
- Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper)
Pinterest
2017
- Powering Search & Recommendations at DoorDash
DoorDash
2017
- How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox
2018
- Calibrated Recommendations (Paper)
Netflix
2018
- Food Discovery with Uber Eats: Recommending for the Marketplace
Uber
2018
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify
2018
- Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned (Paper)
LinkedIn
2018
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba
2019
- SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper)
Alibaba
2019
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper)
Alibaba
2019
- Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor
2019
- Powered by AI: Instagram’s Explore recommender system
Facebook
2019
- Marginal Posterior Sampling for Slate Bandits (Paper)
Netflix
2019
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
2019
- Music recommendation at Spotify
Spotify
2019
- Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox
2019
- Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox
2019
- Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn
2019
- Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon
2020
- P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper)
Amazon
2020
- Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper)
Alibaba
2020
- TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba
2020
- PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper)
Alibaba
2020
- Controllable Multi-Interest Framework for Recommendation (Paper)
Alibaba
2020
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper)
Alibaba
2020
- ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper)
Alibaba
2020
- For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify
2020
- Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify
2020
- Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper)
Spotify
2020
- The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn
2020
- Building a Heterogeneous Social Network Recommendation System
LinkedIn
2020
- How TikTok recommends videos #ForYou
ByteDance
2020
- Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google
2020
- Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google
2020
- Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper)
Google
2020
- Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper)
Tencent
2020
- A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper)
Home Depot
2020
- Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper)
Ikea
2020
- How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
Pinterest
2020
- Multi-task Learning for Related Products Recommendations at Pinterest
Pinterest
2020
- Improving the Quality of Recommended Pins with Lightweight Ranking
Pinterest
2020
- Multi-task Learning and Calibration for Utility-based Home Feed Ranking
Pinterest
2020
- Personalized Cuisine Filter Based on Customer Preference and Local Popularity
DoorDash
2020
- How We Built a Matchmaking Algorithm to Cross-Sell Products
Gojek
2020
- Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper)
Twitter
2021
- Self-supervised Learning for Large-scale Item Recommendations (Paper)
Google
2021
- Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper)
ByteDance
2021
- Using AI to Help Health Experts Address the COVID-19 Pandemic
Facebook
2021
- Advertiser Recommendation Systems at Pinterest
Pinterest
2021
- On YouTube's Recommendation System
YouTube
2021
- "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops
Coveo
2021
- Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates
Walmart
2021
- Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta
2021
- The Amazon Music conversational recommender is hitting the right notes
Amazon
2022
- Personalized complementary product recommendation (Paper)
Amazon
2022
- Building a Deep Learning Based Retrieval System for Personalized Recommendations
eBay
2022
- How We Built: An Early-Stage Machine Learning Model for Recommendations
Peloton
2022
- Lessons Learned from Building out Context-Aware Recommender Systems
Peloton
2022
- Beyond Matrix Factorization: Using hybrid features for user-business recommendations
Yelp
2022
- Improving job matching with machine-learned activity features
LinkedIn
2022
- Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta
2022
- Blueprints for recommender system architectures: 10th anniversary edition
Xavier Amatriain
2022
- How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume
Pinterest
2022
- RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix
2022
- Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations
Slack
2022
- Evolving DoorDash’s Substitution Recommendations Algorithm
DoorDash
2022
- Homepage Recommendation with Exploitation and Exploration
DoorDash
2022
- GPU-accelerated ML Inference at Pinterest
Pinterest
2022
- Addressing Confounding Feature Issue for Causal Recommendation (Paper)
Tencent
2022
Search & Ranking
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code)
Amazon
2016
- How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada
2016
- Ranking Relevance in Yahoo Search (Paper)
Yahoo
2016
- Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn
2016
- Using Deep Learning at Scale in Twitter’s Timelines
Twitter
2017
- An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy
2017
- Powering Search & Recommendations at DoorDash
DoorDash
2017
- Applying Deep Learning To Airbnb Search (Paper)
Airbnb
2018
- In-session Personalization for Talent Search (Paper)
LinkedIn
2018
- Talent Search and Recommendation Systems at LinkedIn (Paper)
LinkedIn
2018
- Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber
2018
- Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper)
Alibaba
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
2018
- Semantic Product Search (Paper)
Amazon
2019
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb
2019
- Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn
2019
- The AI Behind LinkedIn Recruiter Search and recommendation systems
LinkedIn
2019
- Learning Hiring Preferences: The AI Behind LinkedIn Jobs
LinkedIn
2019
- The Secret Sauce Behind Search Personalisation
Gojek
2019
- Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook
2019
- Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper)
Alibaba
2019
- Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search
Alibaba
2019
- Understanding Searches Better Than Ever Before (Paper)
Google
2019
- How We Used Semantic Search to Make Our Search 10x Smarter
Tokopedia
2019
- Query2vec: Search query expansion with query embeddings
GrubHub
2019
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search
Baidu
2019
- Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon
2020
- Managing Diversity in Airbnb Search (Paper)
Airbnb
2020
- Improving Deep Learning for Airbnb Search (Paper)
Airbnb
2020
- Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
2020
- Ads Allocation in Feed via Constrained Optimization (Paper, Video)
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
2020
- AI at Scale in Bing
Microsoft
2020
- Query Understanding Engine in Traveloka Universal Search
Traveloka
2020
- Bayesian Product Ranking at Wayfair
Wayfair
2020
- COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
2020
- Driving Shopping Upsells from Pinterest Search
Pinterest
2020
- GDMix: A Deep Ranking Personalization Framework (Code)
LinkedIn
2020
- Bringing Personalized Search to Etsy
Etsy
2020
- Building a Better Search Engine for Semantic Scholar
Allen Institute for AI
2020
- Query Understanding for Natural Language Enterprise Search (Paper)
Salesforce
2020
- Things Not Strings: Understanding Search Intent with Better Recall
DoorDash
2020
- Query Understanding for Surfacing Under-served Music Content (Paper)
Spotify
2020
- Embedding-based Retrieval in Facebook Search (Paper)
Facebook
2020
- Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper)
JD
2020
- QUEEN: Neural query rewriting in e-commerce (Paper)
Amazon
2021
- Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper)
Amazon
2021
- Seasonal relevance in e-commerce search (Paper)
Amazon
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
2021
- How We Built A Context-Specific Bidding System for Etsy Ads
Etsy
2021
- Pre-trained Language Model based Ranking in Baidu Search (Paper)
Baidu
2021
- Stitching together spaces for query-based recommendations
Stitch Fix
2021
- Deep Natural Language Processing for LinkedIn Search Systems (Paper)
LinkedIn
2021
- Siamese BERT-based Model for Web Search Relevance Ranking (Paper, Code)
Seznam
2021
- SearchSage: Learning Search Query Representations at Pinterest
Pinterest
2021
- Query2Prod2Vec: Grounded Word Embeddings for eCommerce
Coveo
2021
- 3 Changes to Expand DoorDash’s Product Search Beyond Delivery
DoorDash
2022
- Learning To Rank Diversely
Airbnb
2022
- How to Optimise Rankings with Cascade Bandits
Expedia
2022
- A Guide to Google Search Ranking Systems
Google
2022
- Deep Learning for Search Ranking at Etsy
Etsy
2022
- Search at Calm
Calm
2022
Embeddings
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears
2017
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba
2018
- Embeddings@Twitter
Twitter
2018
- Listing Embeddings in Search Ranking (Paper)
Airbnb
2018
- Understanding Latent Style
Stitch Fix
2018
- Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn
2018
- Personalized Store Feed with Vector Embeddings
DoorDash
2018
- Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper)
Moshbit
2019
- Machine Learning for a Better Developer Experience
Netflix
2020
- Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code)
Google
2020
- BERT Goes Shopping: Comparing Distributional Models for Product Representations
Coveo
2021
- The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference
Coveo
2022
- Embedding-based Retrieval at Scribd
Scribd
2021
- Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper)
Apple
2022
- Embeddings at Spotify's Scale - How Hard Could It Be?
Spotify
2023
Natural Language Processing
- Abusive Language Detection in Online User Content (Paper)
Yahoo
2016
- Smart Reply: Automated Response Suggestion for Email (Paper)
Google
2016
- Building Smart Replies for Member Messages
LinkedIn
2017
- How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn
2019
- Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google
2019
- Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon
2019
- Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix
2019
- DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn
2020
- SmartReply for YouTube Creators
Google
2020
- Using Neural Networks to Find Answers in Tables (Paper)
Google
2020
- A Scalable Approach to Reducing Gender Bias in Google Translate
Google
2020
- Assistive AI Makes Replying Easier
Microsoft
2020
- AI Advances to Better Detect Hate Speech
Facebook
2020
- A State-of-the-Art Open Source Chatbot (Paper)
Facebook
2020
- A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook
2020
- Deep Learning to Translate Between Programming Languages (Paper, Code)
Facebook
2020
- Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook
2020
- Introducing Dynabench: Rethinking the way we benchmark AI
Facebook
2020
- How Gojek Uses NLP to Name Pickup Locations at Scale
Gojek
2020
- The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu
2020
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code)
Google
2020
- Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce
2020
- GeDi: A Powerful New Method for Controlling Language Models (Paper, Code)
Salesforce
2020
- Applying Topic Modeling to Improve Call Center Operations
RICOH
2020
- WIDeText: A Multimodal Deep Learning Framework
Airbnb
2020
- Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code)
Facebook
2021
- How we reduced our text similarity runtime by 99.96%
Microsoft
2021
- Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models)
Facebook
2021
- Grammar Correction as You Type, on Pixel 6
Google
2021
- Auto-generated Summaries in Google Docs
Google
2022
- ML-Enhanced Code Completion Improves Developer Productivity
Google
2022
- Words All the Way Down — Conversational Sentiment Analysis
PayPal
2022
Sequence Modelling
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health
2015
- Deep Learning for Understanding Consumer Histories (Paper)
Zalando
2016
- Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health
2016
- Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica
2017
- Deep Learning for Electronic Health Records (Paper)
Google
2018
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba
2019
- Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba
2020
- How Duolingo uses AI in every part of its app
Duolingo
2020
- Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video)
Facebook
2020
- Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn
2021
Computer Vision
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox
2017
- Categorizing Listing Photos at Airbnb
Airbnb
2018
- Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb
2019
- How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic
- Making machines recognize and transcribe conversations in meetings using audio and video
Microsoft
2019
- Powered by AI: Advancing product understanding and building new shopping experiences
Facebook
2020
- A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google
2020
- Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google
2020
- RepNet: Counting Repetitions in Videos (Paper)
Google
2020
- Converting Text to Images for Product Discovery (Paper)
Amazon
2020
- How Disney Uses PyTorch for Animated Character Recognition
Disney
2020
- Image Captioning as an Assistive Technology (Video)
IBM
2020
- AI for AG: Production machine learning for agriculture
Blue River
2020
- AI for Full-Self Driving at Tesla
Tesla
2020
- On-device Supermarket Product Recognition
Google
2020
- Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
2020
- Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper)
Google
2020
- Vision-based Price Suggestion for Online Second-hand Items (Paper)
Alibaba
2020
- New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model)
Facebook
2021
- An Efficient Training Approach for Very Large Scale Face Recognition (Paper)
Alibaba
2021
- Identifying Document Types at Scribd
Scribd
2021
- Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper)
Walmart
2021
- Recognizing People in Photos Through Private On-Device Machine Learning
Apple
2021
- DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Google
2022
- Contrastive language and vision learning of general fashion concepts (Paper)
Coveo
2022
- Leveraging Computer Vision for Search Ranking
BazaarVoice
2023
Reinforcement Learning
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba
2018
- Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba
2018
- Reinforcement Learning for On-Demand Logistics
DoorDash
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
2018
- Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba
2019
- Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga
2020
- Deep Reinforcement Learning in Production Part1 Part 2
Zynga
2020
- Building AI Trading Systems
Denny Britz
2020
- Shifting Consumption towards Diverse content via Reinforcement Learning (Paper)
Spotify
2022
- Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta
2022
- How to Optimise Rankings with Cascade Bandits
Expedia
2022
- Selecting the Best Image for Each Merchant Using Exploration and Machine Learning
DoorDash
2023
Anomaly Detection
- Detecting Performance Anomalies in External Firmware Deployments
Netflix
2019
- Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn
2019
- Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video)
Swedbank
, Hopsworks
2019
- Preventing Abuse Using Unsupervised Learning
LinkedIn
2020
- The Technology Behind Fighting Harassment on LinkedIn
LinkedIn
2020
- Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial
2020
- How Does Spam Protection Work on Stack Exchange?
Stack Exchange
2020
- Auto Content Moderation in C2C e-Commerce
Mercari
2020
- Blocking Slack Invite Spam With Machine Learning
Slack
2020
- Cloudflare Bot Management: Machine Learning and More
Cloudflare
2020
- Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER
2020
- Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank
2020
- Fighting fraud with Triplet Loss
OLX Group
2020
- Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative)
Facebook
2020
- How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4
Facebook
2020
- Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn
2021
- Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop
Uber
2022
- Graph for Fraud Detection
Grab
2022
- Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta
2022
- Evolving our machine learning to stop mobile bots
Cloudflare
2022
- Improving the accuracy of our machine learning WAF using data augmentation and sampling
Cloudflare
2022
- Machine Learning for Fraud Detection in Streaming Services
Netflix
2022
- Pricing at Lyft
Lyft
2022
Graph
- Building The LinkedIn Knowledge Graph
LinkedIn
2016
- Scaling Knowledge Access and Retrieval at Airbnb
Airbnb
2018
- Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest
2018
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
2019
- AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba
2019
- Contextualizing Airbnb by Building Knowledge Graph
Airbnb
2019
- Retail Graph — Walmart’s Product Knowledge Graph
Walmart
2020
- Traffic Prediction with Advanced Graph Neural Networks
DeepMind
2020
- SimClusters: Community-Based Representations for Recommendations (Paper, Video)
Twitter
2020
- Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper)
Alibaba
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
2021
- JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper)
JPMorgan Chase
2021
- How AWS uses graph neural networks to meet customer needs
Amazon
2022
- Graph for Fraud Detection
Grab
2022
Optimization
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft
2016
- The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED)
Grab
2017
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber
2018
- Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash
2020
- Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG
2020
- Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper)
Amazon
2020
- Optimizing DoorDash’s Marketing Spend with Machine Learning
DoorDash
2020
- Using learning-to-rank to precisely locate where to deliver packages (Paper)
Amazon
2021
Information Extraction
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten
2013
- Using Machine Learning to Index Text from Billions of Images
Dropbox
2018
- Extracting Structured Data from Templatic Documents (Paper)
Google
2020
- AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video)
Amazon
2020
- One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper)
Alibaba
2020
- Information Extraction from Receipts with Graph Convolutional Networks
Nanonets
2021
Weak Supervision
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google
2019
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel
2019
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
2019
- Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM
2019
Generation
- Better Language Models and Their Implications (Paper)
OpenAI
2019
- Image GPT (Paper, Code)
OpenAI
2019
- Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI
2020
- Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar
2020
- Unit Test Case Generation with Transformers
Microsoft
2021
Audio
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google
2020
- The Machine Learning Behind Hum to Search
Google
2020
Privacy-preserving Machine Learning
- Federated Learning: Collaborative Machine Learning without Centralized Training Data (Paper)
Google
2017
- Federated Learning with Formal Differential Privacy Guarantees (Paper)
Google
2022
- MPC-based machine learning: Achieving end-to-end privacy-preserving machine learning (Paper)
Facebook
2022
Validation and A/B Testing
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper)
Google
2010
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google
2015
- Twitter Experimentation: Technical Overview
Twitter
2015
- It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix
2016
- Building Pinterest’s A/B Testing Platform
Pinterest
2016
- Experimenting to Solve Cramming
Twitter
2017
- Building an Intelligent Experimentation Platform with Uber Engineering
Uber
2017
- Scaling Airbnb’s Experimentation Platform
Airbnb
2017
- Meet Wasabi, an Open Source A/B Testing Platform (Code)
Intuit
2017
- Analyzing Experiment Outcomes: Beyond Average Treatment Effects
Uber
2018
- Under the Hood of Uber’s Experimentation Platform
Uber
2018
- Constrained Bayesian Optimization with Noisy Experiments (Paper)
Facebook
2018
- Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab
2018
- Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better
2019
- Detecting Interference: An A/B Test of A/B Tests
LinkedIn
2019
- Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber
2020
- Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka
2020
- Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix
2020
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix
2020
- Experimentation with Resource Constraints
Stitch Fix
2020
- Computational Causal Inference at Netflix (Paper)
Netflix
2020
- Key Challenges with Quasi Experiments at Netflix
Netflix
2020
- Making the LinkedIn experimentation engine 20x faster
LinkedIn
2020
- Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn
LinkedIn
2020
- How to Use Quasi-experiments and Counterfactuals to Build Great Products
Shopify
2020
- Improving Experimental Power through Control Using Predictions as Covariate
DoorDash
2020
- Supporting Rapid Product Iteration with an Experimentation Analysis Platform
DoorDash
2020
- Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
DoorDash
2020
- Leveraging Causal Modeling to Get More Value from Flat Experiment Results
DoorDash
2020
- Iterating Real-time Assignment Algorithms Through Experimentation
DoorDash
2020
- Spotify’s New Experimentation Platform (Part 1) (Part 2)
Spotify
2020
- Interpreting A/B Test Results: False Positives and Statistical Significance
Netflix
2021
- Interpreting A/B Test Results: False Negatives and Power
Netflix
2021
- Running Experiments with Google Adwords for Campaign Optimization
DoorDash
2021
- The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
DoorDash
2021
- Experimentation Platform at Zalando: Part 1 - Evolution
Zalando
2021
- Designing Experimentation Guardrails
Airbnb
2021
- How Airbnb Measures Future Value to Standardize Tradeoffs
Airbnb
2021
- Network Experimentation at Scale(Paper]
Facebook
2021
- Universal Holdout Groups at Disney Streaming
Disney
2021
- Experimentation is a major focus of Data Science across Netflix
Netflix
2022
- Search Journey Towards Better Experimentation Practices
Spotify
2022
- Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb
Airbnb
2022
- Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving
Airbnb
2022
- Challenges in Experimentation
Lyft
2022
- Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity
Booking
2022
- Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash
DoorDash
2022
- Comparing quantiles at scale in online A/B-testing
Spotify
2022
- Accelerating our A/B experiments with machine learning
Dropbox
2023
- Supercharging A/B Testing at Uber
Uber
Model Management
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
2018
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
2019
- Runway - Model Lifecycle Management at Netflix
Netflix
2020
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
2020
- ML Model Monitoring - 9 Tips From the Trenches
Nubank
2021
- Dealing with Train-serve Skew in Real-time ML Models: A Short Guide
Nubank
2023
Efficiency
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper)
Facebook
2020
- How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Roblox
2020
- Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper)
Uber
2021
- GPU-accelerated ML Inference at Pinterest
Pinterest
2022
Ethics
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn
2020
- LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn
2020
- Introducing Twitter’s first algorithmic bias bounty challenge
Twitter
2021
- Examining algorithmic amplification of political content on Twitter
Twitter
2021
- A closer look at how LinkedIn integrates fairness into its AI products
LinkedIn
2022
Infra
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
2020
- Elastic Distributed Training with XGBoost on Ray
Uber
2021
MLOps Platforms
- Meet Michelangelo: Uber’s Machine Learning Platform
Uber
2017
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
2018
- Big Data Machine Learning Platform at Pinterest
Pinterest
2019
- Core Modeling at Instagram
Instagram
2019
- Open-Sourcing Metaflow - a Human-Centric Framework for Data Science
Netflix
2019
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
2020
- Real-time Machine Learning Inference Platform at Zomato
Zomato
2020
- Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
Lyft
2020
- Building Flexible Ensemble ML Models with a Computational Graph
DoorDash
2021
- LyftLearn: ML Model Training Infrastructure built on Kubernetes
Lyft
2021
- "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper)
Coveo
2021
- MLOps at GreenSteam: Shipping Machine Learning
GreenSteam
2021
- Evolving Reddit’s ML Model Deployment and Serving Architecture
Reddit
2021
- Redesigning Etsy’s Machine Learning Platform
Etsy
2021
- Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta
2021
- Building a Platform for Serving Recommendations at Etsy
Etsy
2022
- Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb
Airbnb
2022
- DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn
LinkedIn
2022
- The Magic of Merlin: Shopify's New Machine Learning Platform
Shopify
2022
- Zalando's Machine Learning Platform
Zalando
2022
- Inside Meta's AI optimization platform for engineers across the company (Paper)
Meta
2022
- Monzo’s machine learning stack
Monzo
2022
- Evolution of ML Fact Store
Netflix
2022
- Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline
Binance
2022
- Serving Machine Learning Models Efficiently at Scale at Zillow
Zillow
2022
- Didact AI: The anatomy of an ML-powered stock picking engine
Didact AI
2022
- Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists
Stitch Fix
2022
- Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper)
IBM
2022
Practices
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio
2012
- Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google
2014
- Rules of Machine Learning: Best Practices for ML Engineering
Google
2018
- On Challenges in Machine Learning Model Management
Amazon
2018
- Machine Learning in Production: The Booking.com Approach
Booking
2019
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking
2019
- Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank
2019
- Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper)
Cambridge
2020
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
2020
- The problem with AI developer tools for enterprises
Databricks
2020
- Continuous Integration and Deployment for Machine Learning Online Serving and Models
Uber
2021
- Tuning Model Performance
Uber
2021
- Maintaining Machine Learning Model Accuracy Through Monitoring
DoorDash
2021
- Building Scalable and Performant Marketing ML Systems at Wayfair
Wayfair
2021
- Our approach to building transparent and explainable AI systems
LinkedIn
2021
- 5 Steps for Building Machine Learning Models for Business
Shopify
2021
- Data Is An Art, Not Just A Science—And Storytelling Is The Key
Shopify
2022
- Best Practices for Real-time Machine Learning: Alerting
Nubank
2022
- Automatic Retraining for Machine Learning Models: Tips and Lessons Learned
Nubank
2022
- RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix
2022
- ML Education at Uber: Frameworks Inspired by Engineering Principles
Uber
2022
- Building and Maintaining Internal Tools for DS/ML teams: Lessons Learned
Nubank
2024
Team structure
- What is the most effective way to structure a data science team?
Udemy
2017
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix
2016
- Building The Analytics Team At Wish
Wish
2018
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix
2019
- Cultivating Algorithms: How We Grow Data Science at Stitch Fix
Stitch Fix
- Analytics at Netflix: Who We Are and What We Do
Netflix
2020
- Building a Data Team at a Mid-stage Startup: A Short Story
Erikbern
2021
- A Behind-the-Scenes Look at How Postman’s Data Team Works
Postman
2021
- Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike?
Nubank
2022
Fails
- When It Comes to Gorillas, Google Photos Remains Blind
Google
2018
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate
2020
- An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor
Harrisburg University
2020
- It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI
2020
- A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom
2020
- More in awful-ai
- AI Incident Database
Partnership on AI
2022
P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys