Currently, map-reduce entity count can be very slow. This SQL can take up to 20 seconds to execute for largest studies:
WITH
PCO_0000024 as (
SELECT a.Household_stable_id
FROM eda.attributevalue_UMSP_1_Household t, eda.ancestors_UMSP_1_Household a
WHERE t.Household_stable_id = a.Household_stable_id
AND attribute_stable_id = 'OBI_0001620'
AND number_value >= -4.434044005032582 AND number_value <= 7.798078531355303
INTERSECT
SELECT a.Household_stable_id
FROM eda.attributevalue_UMSP_1_Household t, eda.ancestors_UMSP_1_Household a
WHERE t.Household_stable_id = a.Household_stable_id
AND attribute_stable_id = 'OBI_0001621'
AND (number_value >= 20.0830078125 AND number_value <= 62.27050781250001)
),
EUPATH_0000096 as (
SELECT a.Participant_stable_id, a.Household_stable_id
FROM eda.attributevalue_UMSP_1_Participant t, eda.ancestors_UMSP_1_Participant a
WHERE t.Participant_stable_id = a.Participant_stable_id
AND attribute_stable_id = 'OBI_0001169'
AND number_value >= 20 AND number_value <= 60
),
EUPATH_0000609 as (
SELECT Participant_stable_id, Household_stable_id, Sample_stable_id FROM eda.ancestors_UMSP_1_Sample
)
SELECT count(distinct Sample_stable_id) as count
FROM (
SELECT distinct EUPATH_0000609.Sample_stable_id
FROM PCO_0000024, EUPATH_0000096, EUPATH_0000609
WHERE PCO_0000024.Household_stable_id = EUPATH_0000096.Household_stable_id
AND EUPATH_0000096.Participant_stable_id = EUPATH_0000609.Participant_stable_id
) t
Count endpoint uses files under the following conditions
a. FILE_BASED_SUBSETTING env var is enabled
b. Directory with binary files is mounted and all files are available
Overview
Currently, map-reduce entity count can be very slow. This SQL can take up to 20 seconds to execute for largest studies:
Related PR: https://github.com/VEuPathDB/lib-eda-subsetting/pull/18
Acceptance Criteria