Onto-Med / top-backend

Spring Boot based backend of the TOP Framework
MIT License
0 stars 1 forks source link

Store query results as wide table #195

Closed ChristophB closed 7 months ago

ChristophB commented 8 months ago

Example CSV:

Id;Weight;Weight::≥ 100 kilogram
9;105.352017648371|2003-12-31T09:30:00,78.2138146895258|2006-12-27T07:40:00;true
15;106.897435802024|1996-04-12T22:50:00,81.0336523267653|1988-07-19T14:30:00;true
37;100.059009926007|2021-07-30T09:30:00,50.2496308015693|2015-01-27T19:00:00,13.1297828859045|2005-03-27T16:20:00,16.2375820628664|2005-11-08T09:50:00;true
38;103.776892516161|1990-12-20T20:00:00,40.2825456812594|1973-05-20T23:50:00,21.1399553055553|1969-08-05T00:30:00,79.9382288256788|2001-02-02T01:00:00;true
42;139.768977164027|2007-10-22T06:40:00,11.3292117084845|1955-08-27T08:50:00,76.8050724631382|1993-06-24T16:20:00,36.3132216119026|1964-07-24T19:20:00;true

This CSV can be transformed to a more analysable-friendly format, for example with the following R code:

library(tidyr)
library(dplyr)

read.csv2("wide_data.csv") %>%
  mutate(Other.weight = rev(Weight)) %>% # add another phenotype
  separate_wider_delim(Weight:Other.weight, ",", names_sep = "_", too_few = "align_start") %>%
  pivot_longer(c(starts_with("Weight"), starts_with("Other.weight")), names_to = c("property", "index"), names_sep = "_") %>%
  drop_na() %>%
  select(-index) %>%
  separate_wider_delim(value, "|", names = c("value", "date"), too_few = "align_start") %>%
  group_by(Id, date, property) %>%
  mutate(index = row_number()) %>% # need to add indices to duplicated dates
  ungroup() %>%
  pivot_wider(names_from = property, values_from = value)