KnowledgeEdgeAI / PETs_for_Public_Health_Challenge

A Privacy Enhanced Tool for predicting hotspot areas during pandemic, analyzing consumption trends and estimating contact matrix.
https://pets-for-public-health-challenge.readthedocs.io/en/latest/
MIT License
1 stars 1 forks source link
case-prediction contact-matrix hotspot-detection opendp

Hotspot Detection, Mobility, Pandemic Stages and Contact Metric using Differential Privacy

.. image:: https://readthedocs.org/projects/pets-for-public-health-challenge/badge/?version=latest :target: https://pets-for-public-health-challenge.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. This README.rst should render properly both on GitHub and in Sphinx.

Hotspot Detection

Description

Areas with high physical economic activities can be identified as a pandemic hotspot. This analysis tracks pandemic hotspots by monitoring differential private release of financial transactions in a city and identifying areas with high transaction activity.

Assumptions

Algorithm

. Add City Column: A new city column is added based on the postal codes (make_preprocess_location).

. Filter OFFLINE Transactions: Only "OFFLINE" transactions are considered (make_filter).

. Filter City Postal Codes: Filter for the postal codes of the selected city (make_filter).

. Filter by Time Frame : Filter data for the selected time frame (make_truncate_time).

. Transaction Summing & Noise Addition: Sum the number of transactions by postal code, and add Gaussian noise (make_private_sum_by).

. Visualization: Differentially private data is plotted on a colored map for hotspot visualization.

Sensitivity and Epsilon Analysis

Mobility Detection

Description

This analysis tracks mobility by monitoring differential private time series release of financial transactions in the retail_and_recreation, grocery_and_pharmacy and transit_stations super categories which matches with google mobility data for easy validation.

Assumptions

Algorithm

. Add City Column: A new city column is added based on postal codes (make_preprocess_location).

. Add Super Category Column : A new merch_super_category column is added for classifying transactions into retail_and_recreation, grocery_and_pharmacy and transit_stations categories (make_preprocess_merchant_mobility).

. Filter for City: Data for the selected city is filtered (make_filter).

. Filter for super category: data is filtered for retail_and_recreation, grocery_and_pharmacy and transit_stations categories (make_filter).

. Filter by Time Frame: Data is filtered for the selected time frame (make_truncate_time).

. Transaction Summing & Noise Addition: Sum the number of transactions by postal code for each timestep and add Gaussian noise (make_private_sum_by).

Sensitivity and Epsilon Analysis

Validation

Pandemic Adherence Detection

Description

Analyzes transaction behavior to identify pandemic stages by comparing transactions in essential vs luxurious goods categories.

Assumptions

Algorithm

. Add City Column : A new city column is added based on postal codes (make_preprocess_location).

. Filter for City : Data for the selected city is filtered (make_filter).

. Add Super Category Column : A new merch_super_category column is added for classifying transactions into luxurious and essential categories (make_preprocess_location).

. Filter by Super Category : Only transactions related to luxurious or essential goods are filtered out (make_filter).

. Filter by Time Frame : Data is filtered for the selected time frame (make_truncate_time).

. Transaction Summing & Noise Addition: Sum the number of transactions by postal code and add Gaussian noise (make_private_sum_by).

. Visualization : Differentially private data is plotted for visualization of pandemic stages.

Sensitivity and Epsilon Analysis

Contact Pattern Matrix Estimation

Description

Estimates the contact matrix by analyzing transactional data for different age groups across various merchandise categories.

Assumptions

. Proportion of Age Groups : Assumed participation in merchandise categories follows an age group proportion map.

. The persons, involved in the transactions, only make contact with individuals also involved in the transactions from the data.

. Every transaction under nb_transactions is done by a unique individual and this is true across different merchant IDs as well. Thus, total number of unique individuals is equal to the total number of transactions across all the merchant IDs.

. The contacts among various age groups is exclusive ie every individual, from any given age group, make contact with distinct individuals from other age groups.. In the video, they also took this assumptions.

Algorithm

. Filter Week : Select the specific week for analysis.

. Filter City : Choose the city of interest (e.g., Bogotá).

. Filter OFFLINE Transactions : Only consider offline transactions.

. Group by Merchant Category : Sum the number of transactions (nb_transactions).

. Private Count of Postal Codes: Obtain the private count of unique postal codes for each merchant category and week.

. Compute Private Mean Transactions : Calculate the average number of transactions per zip code using the age group proportion map.

Sensitivity and Epsilon Analysis

Challenges

File Strurcture