google-research-datasets / tap-typing-with-touch-sensing-images

The Tap Typing with Touch Sensing Images (TSI) dataset contains data of user taps on a mobile touchscreen keyboard, including elliptical features and capacitive sensing images of the taps. The dataset aligns each tap with a key the user intended to type during data collection so it can be used for keyboard decoder training and/or evaluation.
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

Tap Typing with Touch Sensing Images (TSI) Dataset

This dataset is associated with the UIST 2024 paper titled "Can Capacitive Touch Images Enhance Mobile Keyboard Decoding?" by Piyawat Lertvittayakumjorn, Shanqing Cai, Billy Dou, Cedric Ho, and Shumin Zhai from Google.

Abstract

This Tap Typing with Touch Sensing Images (TSI) dataset contains a collection of touch points on a mobile touchscreen keyboard. Each touch point corresponds to a user tap that was intended to enter a specific character via the keyboard. This dataset was collected by asking 16 participants to copy-type given prompts, so we can align each touch point with a character in the prompt and construct pairs of (touch data, intended key) for training a keyboard decoder. Uniquely, the TSI dataset includes not only the centroids of the user taps, but also their elliptical features and capacitive sensing images (referred to as "touch heatmaps" or simply "heatmaps" in our UIST paper), providing richer information for keyboard decoding. In total, this TSI dataset consists of 43735 examples together with a keyboard layout file and a prompt file.

Terminology


Visualization of a sample touch point. The heatmap depicts the capacitive touch sensing image for this touch point. The red dot indicates the touch centroid derived from the touch sensing image. The red ellipse represents the touch ellipse fitted to this touch area. The solid and dashed lines illustrate the major and minor axes of the ellipse, respectively.

Data collection and processing

The touch points in this dataset were collected during Study 3 in our UIST paper by asking 16 participants to copy-type given prompts.

Participant-inclusion criteria

A total of 16 participants (8 male, 7 female, 1 non-binary) were recruited for this user study. The inclusion criteria included: 1) regularly uses typing on a smartphone, 2) uses English as the primary language for mobile typing, and 3) no vision or motor impairment that may affect the copy typing task in the user study.

Task details

Each user-study participant was asked to perform four separate typing task blocks, each consisting of 30 text prompts to be copy-typed. The users were instructed to type as quickly and as accurately as possible (with their usual fast speed). Also, they were allowed to correct typing errors before each submission, but this was not mandatory unless the difference between their submitted text and the original prompt exceeded 60% of the prompt's length. They were also given a 1-2 minute break before proceeding to the next block. The total duration of the entire user-study session varied between 30 and 50 minutes among the 16 participants (including the time they used to answer questionnaires related to the study).

Device and setup

We used two Pixel 6 Pro devices for data collection. Each participant used one of the phones provided by the experimenter. These phones were held in the default portrait orientation and configured to log touch sensing images during the study. We used an Android application for showing prompts and collecting submitted texts and touch data. A custom build of Gboard was used in this app. To collect as many touch points as possible, we disabled intelligent keyboard features such as auto-correction, next-word prediction, double-space period, and grammar check. We also disabled the per-key visual and haptic feedback in the keyboard in order not to distract the participants.



An android application used for data collection. Though the suggestion bar was shown, we strongly discouraged the participants from tapping it so we could collect as many touch points as possible. Note that the screen was partially cropped to hide private or irrelevant information.

Prompt set construction

The 30 prompts presented as the text for copy-typing in each task block were divided into two subsets:

  1. English phrases: 20 phrases or sentences from the MacKenzie and Soukoreff (2003) sentences.
  2. Random strings: 10 strings of random characters chosen from the 26 English letters, SPACE, and PERIOD. Each random string was exactly 8-character long and does not have SPACE at the beginning or at the end to avoid visual ambiguity. These strings are considered out-of-vocabulary (OOV) unlike the English phrases.

The English phrases and random strings were mixed and shuffled randomly in order in each block. The four blocks consisted of non-overlapping prompts. In addition, the prompts were randomized in order among the participants.

By construction, all the possible intended keys of this dataset are A-Z, SPACE, and PERIOD (28 keys in total). The distributions of them are shown below.


Distributions of reference characters in the dataset (Left) from English phrases prompts only, (Middle) from random strings prompts only, (Right) from both types of prompts.

Touch point & Key alignment

We aligned touch points in the submitted string to the reference prompt. Since the submitted string may contain errors, we used the Needleman–Wunsch algorithm that supports insertion, omission, substitution, and transposition errors for alignment (Needleman and Wunsch, 1970). After that, we aligned deleted touch points by replaying the typing sequence (including backspaces) step by step and performing the alignment, using the Needleman–Wunsch algorithm, every time before a touch point was deleted. The reference text for alignment was a prefix of the prompt (with 1 character longer than the current text so far to allow an omission error). We only kept alignments of those not being aligned during submitted touch point alignment. We refer to Section 4.2 of our UIST paper for more details.

Data structure

There are three files in the data directory -- touch_data.csv, keyboard_data.json, and prompt_data.csv.

touch_data.csv

A few example rows of touch_data.csv are shown below.

participant_id,task_id,trial_id,timestamp_ms,ref_char,ref_char_index_in_prompt,first_frame_touch_x,first_frame_touch_y,first_frame_touch_major,first_frame_touch_minor,first_frame_touch_orientation,first_frame_touch_heatmap,first_frame_heatmap_overlap_vector,was_deleted,lm_scores
user01,task1,0,239529639,m,0,1108.0,519.0,152.0,117.0,0.0866699144244194,"[[8, 13, 20], [8, 14, 18], [9, 13, 203], [9, 14, 79], [10, 13, 98], [10, 14, 38]]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.990625, 23.1728125, 1.3865625, 352.27718749999997, 63.215624999999996, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",False,nan
user01,task1,0,239530001,y,1,757.0,145.0,172.0,164.0,-0.2208932489156723,"[[3, 8, 10], [3, 9, 14], [3, 10, 10], [4, 8, 70], [4, 9, 186], [4, 10, 57], [5, 8, 79], [5, 9, 208], [5, 10, 69], [6, 8, 17], [6, 9, 33], [6, 10, 19]]","[0.0, 0.0, 0.0, 0.0, 0.0, 2.0771875, 35.689375, 20.8834375, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 154.82625000000002, 42.425625, 0.0, 0.0, 0.0, 496.54812499999997, 0.0, 0.0, 0.0]",False,"[0.280352, 0.000941453, 0.0130359, 0.00083572, 0.140016, 0.0, 0.00124548, 0.00166307, 0.14564, 0.000782347, 0.00126418, 0.00368562, 0.00894826, 0.0, 0.14503, 0.000911226, 0.0, 0.00346484, 0.0061941, 0.00344941, 0.0256453, 0.0, 0.000969431, 0.0, 0.171604, 0.0, 0.0443219]"
user01,task1,0,239530213,SPACE,2,722.0,676.0,191.0,174.0,0.8444564342498779,"[[10, 7, 10], [10, 8, 44], [10, 9, 55], [10, 10, 15], [11, 7, 31], [11, 8, 196], [11, 9, 248], [11, 10, 35], [12, 7, 26], [12, 8, 123], [12, 9, 133], [12, 10, 23], [13, 7, 10], [13, 8, 18], [13, 9, 16], [13, 10, 10]]","[0.0, 47.33625, 34.07749999999999, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 195.58625, 0.0, 0.0, 0.0, 0.0, 710.5999999999999, 0.0]",False,"[0.00334769, 7.19902e-05, 0.00282896, 4.7261e-07, 0.00174823, 6.74272e-05, 0.000600986, 0.0, 2.36306e-07, 0.0, 3.53734e-08, 0.00171801, 0.0, 0.000129763, 1.2671e-05, 0.0, 0.0, 0.00580491, 0.0110847, 0.00156668, 0.0, 0.000100176, 0.0, 7.08915e-07, 0.0, 0.0, 0.970917]"
...

This is a csv file containing 43735 lines (excluding the header line) corresponding to 43735 touch points. The lines are sorted by participant_id, then task_id, and then trial_id. While one user tap is usually long enough to generate a series of touch sensing images, this dataset focuses only on the first frame of the sequence. Therefore, we use the prefix firstframe for every touch signal field. Specifically, each touch point in touch_data.csv consists of


The visualization of the first example on the keyboard layout.

keyboard_data.json

Some example lines of keyboard_data.json are shown below.

{
  "device_info": {
    "model": "Pixel 6 Pro",
    "whole_screen_width": 1440.0,
    "whole_screen_height": 3120.0,
    "num_all_heatmap_rows": 39,
    "num_all_heatmap_cols": 18,
    "num_valid_heatmap_rows": 16
  },
  "keyboard_info": {
    "keyboard_width": 1440.0,
    "keyboard_height": 854.0,
    "top_left_x_position": 0.0,
    "top_left_y_position": 2098.0,
    "most_common_key_width": 135.0,
    "most_common_key_height": 206.0
  },
  "keys_info": {
    "q": {
      "key_id": "q",
      "text_literal": "Q",
      "key_center_x": 112.0,
      "key_center_y": 131.0,
      "key_width": 135.0,
      "key_height": 206.0
    },
    "w": {
      "key_id": "w",
      "text_literal": "W",
      "key_center_x": 247.0,
      "key_center_y": 131.0,
      "key_width": 135.0,
      "key_height": 206.0
    },
    ...
  }
}

This json file contains information about the device and the keyboard used for data collection. It consists of three major parts.


An illustration of the keyboard (the orange rectangle area) and its position on the entire screen (the grey rectangle). The device size and the keyboard size are annotated in the figure. The red dot represents the center of the key Q of which the position and the size are also annotated. The grid in the figure represents the array of capacitive touch sensors that generate the touch sensing images (touch heatmaps). The blue border highlights the region of the heatmaps recorded in this dataset (i.e., the 16 bottom rows). This figure can be drawn using the data in keyboard_data.json.

prompt_data.csv

A few example rows of prompt_data.csv are shown below.

participant_id,task_id,trial_id,prompt_type,prompt
user01,task1,0,phrase,My preferred treat is chocolate.
user01,task1,1,random,Rcgvifjs
user01,task1,2,phrase,You will loose your voice.
...

This file lists the prompts used during data collection. Each row has the participant_id, the task_id, and the trial_id as the composite key and the prompt_type (phrase or random) and the prompt text as values. The prompt_data.csv can be used together with the touch_data.csv to figure out where the reference character is in the prompt.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name Tap Typing with Touch Sensing Images (TSI) Dataset
description The Tap Typing with Touch Sensing Images (TSI) dataset contains data of user taps on a mobile touchscreen keyboard. Each tap is characterized by its centroid and elliptical features as well as its capacitive touch sensing image (TSI). The dataset aligns each tap with a key the user intended to type during data collection so it can be used for keyboard decoder training and/or evaluation. In total, this TSI dataset consists of 43,735 taps (from 16 participants performing copy-typing tasks) together with a keyboard layout file and a prompt file.
sameAs https://github.com/google-research-datasets/tap-typing-with-touch-sensing-images

Citation

If you use or refer to the TSI dataset, please cite the following paper.

@misc{lertvittayakumjorn2024can,
      title={Can Capacitive Touch Images Enhance Mobile Keyboard Decoding?}, 
      author={Piyawat Lertvittayakumjorn and Shanqing Cai and Billy Dou and Cedric Ho and Shumin Zhai},
      year={2024},
      eprint={2410.02264},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2410.02264}, 
}

Contact

Piyawat Lertvittayakumjorn (firstname [at] google [dot] com)