VEuPathDB / EdaSubsettingService

A REST service to provide data and subsetting in the Exploratory Data Analysis Workspace
Apache License 2.0
0 stars 0 forks source link

Add a map-reduce entity count endpoint to Subsetting service #89

Closed dmgaldi closed 1 year ago

dmgaldi commented 1 year ago

Overview

Currently, map-reduce entity count can be very slow. This SQL can take up to 20 seconds to execute for largest studies:

WITH
PCO_0000024 as (
  SELECT a.Household_stable_id
  FROM eda.attributevalue_UMSP_1_Household t, eda.ancestors_UMSP_1_Household a
  WHERE t.Household_stable_id = a.Household_stable_id
  AND attribute_stable_id = 'OBI_0001620'
  AND number_value >= -4.434044005032582 AND number_value <= 7.798078531355303
INTERSECT
  SELECT a.Household_stable_id
  FROM eda.attributevalue_UMSP_1_Household t, eda.ancestors_UMSP_1_Household a
  WHERE t.Household_stable_id = a.Household_stable_id
  AND attribute_stable_id = 'OBI_0001621'
  AND (number_value >= 20.0830078125 AND number_value <= 62.27050781250001)
),
EUPATH_0000096 as (
  SELECT a.Participant_stable_id, a.Household_stable_id
  FROM eda.attributevalue_UMSP_1_Participant t, eda.ancestors_UMSP_1_Participant a
  WHERE t.Participant_stable_id = a.Participant_stable_id
  AND attribute_stable_id = 'OBI_0001169'
  AND number_value >= 20 AND number_value <= 60
),
EUPATH_0000609 as (
  SELECT Participant_stable_id, Household_stable_id, Sample_stable_id FROM eda.ancestors_UMSP_1_Sample
)
SELECT count(distinct Sample_stable_id) as count
FROM (
  SELECT distinct EUPATH_0000609.Sample_stable_id
  FROM PCO_0000024, EUPATH_0000096, EUPATH_0000609
  WHERE PCO_0000024.Household_stable_id = EUPATH_0000096.Household_stable_id
  AND EUPATH_0000096.Participant_stable_id = EUPATH_0000609.Participant_stable_id
) t

Related PR: https://github.com/VEuPathDB/lib-eda-subsetting/pull/18

Acceptance Criteria

  1. Count endpoint uses files under the following conditions a. FILE_BASED_SUBSETTING env var is enabled b. Directory with binary files is mounted and all files are available