Health Systems Strengthening and Key Populations
1 stars 3 forks source link

SID QC Repo #37

Open KateWilkinsMPH opened 2 years ago

KateWilkinsMPH commented 2 years ago

Please report any issues you find with the SID dataset and crosswalk. Please give as much detail as possible, including: -File name where the issue was found -Row and column where the issue was found -Specify the error, and what the correct data should be

When possible, please provide screenshots of the issue, and the tool you used to verify.

Please flag @KateWilkinsMPH and @LaChandraS so we can determine if this is a data or a data pull issue and determine how to resolve.

crgibs commented 2 years ago

RM Data Flags - Referencing SA Raw Data file/RM Raw Data tab & South Africa_RM_2021 File Health Workforce

  1. The Func_Elem_Disagg items are listed 3x. For example, AW: Salary and Benefits, Salary Top-Ups, Training and Supervision are duplicated 3 times. Also, legend items are dispersed between the 3 grouping of rows.
  2. “Nominal, None, Nominal” missing in SA RM for SFP Host Govt. in Health Workforce/AW
  3. NON_SD_PEPFAR_Implementers column missing
  4. “Secondary, None, Secondary” missing in SA RM for SFP Host Govt in Health Workforce/AS
  5. SA RM says “Primary, None, Primary” but RM Raw Data says “Primary, Primary, Primary” for NON_SD_GF_Implementers in Health Workforce/OS
  6. None of the SFP items in SA RM match the RM Raw Data in Health Workforce/OS
  7. Number of staff says #N/A for all in raw data Above Site (Systems) Programs
  8. Pre-Service Training Rows for SFP is not matching in RM/Raw data
  9. In-Service Training/Continuing Medical Education Systems legend items do not align in NON_SD or SFP columns
  10. Forecasting and Planning legend items do not alight with SFP columns
  11. Sourcing and Procurement legend items do not alight with SFP columns
  12. Quality Assurance and Control legend items do not alight with SFP or NON SD columns
  13. Risk Management, Logistics Management, Warehousing and Inventory Management, Transport and Distribution, Waste Management and Return legend items do not align with SFP/NON SD columns
  14. Data Systems, Monitoring and EvaluationSurveys and Surveillance, HIV Population-based survey (e.g., PHIA), KP Demographic Surveys (e.g., IBBS) legend items do not align with SFP/NON SD columns
  15. All Laboratory Systems rows legend items do not align with SFP/NON SD columns
  16. Health Financing, Governance and Policy, Institutional and Organizational Development, Site Level Quality Management, Other Systems Support legend items do not align with SFP/NON SD columns Program Management
  17. At the Implementation Level (Implementing Partners)/ At the Donor Level legend items do not align with SFP column
crgibs commented 2 years ago

CrossWalk Flags (Domain C) – Referencing files cw_c_KW & South Africa_SID_2021 & South Africa_SID_2019

  1. SIDshortquestion 12.3 for SID 2019/2021 is “Information on cost of service provision” but says “Unit Costs” in the Crosswalk. There are also different response options in the SID 2019/2021
  2. SIDUID C.12.4.I reads “Development and implemented” in CW but “Developed and Implemented” in SID 2019/2021
  3. SIDUIDs missing for Question 13. Market Openness
  4. SIDUID 13.2.B.iii in the response “of” has two o’s/reads nongovernment facilities in CW but reads nongovernment institutions in SID
  5. Question 13.8 have different questions in CW/SID
  6. Question 13.10 have different questions in CW/SID
  7. Question 13.11 is missing responses in the CW
  8. In Question 13.12 the “Short_Question” (response column) column includes/carry’s over the “Quality standards for HIV commodities” (SIDshortquestion)
  9. Question responses for 13.13/13.15/13.16/13.17 differ between CW/SID
KateWilkinsMPH commented 2 years ago

Hi @crgibs - can you put rows for each of these so we know where to look?

crgibs commented 2 years ago

CrossWalk Flags (Domain C) – Referencing files cw_c_KW & South Africa_SID_2017

  1. SIDUID C.12.4.I reads “Development and implemented” in CW but “Developed and Implemented” in SID 2019/2017
crgibs commented 2 years ago

Referencing SA Raw Data (Rows 579-1180) / South Africa SID 2017-2019-2021

  1. UUID_629 (ROW630) - "Bodies" is spelled wrong in the SIDsubquestion column
  2. UUID_634 (ROW635) - "Opportunities" is spelled wrong in the SIDsubquestion column
  3. UUID_638 (ROW639) - "Joint" is spelled wrong in the SIDsubquestion column
  4. UUID__719 - UUID_722 (ROWS 720-723) are missing SIDUIDs for 2017/2019/2021
  5. UUID__025 - UUID_440 (ROWS 745-756) are missing SIDUIDs for 2017/2019/2021/ All the SIDs are missing the responses in SIDsubquestion column
  6. Question B.10.5 is not in order in the SA Raw Data; broken up between other questions
  7. Multiple duplicated between BUID_260-BUID__446 (ROWS 1153-1180)
  8. BUID__289 - BUID_293 (ROWS 1023-1027) are missing SIDUIDs for 2019/2021 (SIDs 2019/2021 have a 7.8 I believe that's what's missing)
  9. Same as number 8 for BUID_341-BUID_345; 8.8 is missing
KateWilkinsMPH commented 2 years ago

QC of raw data: SID 2017, Domain A

SID 2019, Domain A

SID 2021, Domain A

LaChandraS commented 2 years ago

RM Data Flags - Referencing SA Raw Data file/RM Raw Data tab & South Africa_RM_2021 File Health Workforce

  1. The Func_Elem_Disagg items are listed 3x. For example, AW: Salary and Benefits, Salary Top-Ups, Training and Supervision are duplicated 3 times. Also, legend items are dispersed between the 3 grouping of rows.
  2. “Nominal, None, Nominal” missing in SA RM for SFP Host Govt. in Health Workforce/AW
  3. NON_SD_PEPFAR_Implementers column missing
  4. “Secondary, None, Secondary” missing in SA RM for SFP Host Govt in Health Workforce/AS
  5. SA RM says “Primary, None, Primary” but RM Raw Data says “Primary, Primary, Primary” for NON_SD_GF_Implementers in Health Workforce/OS
  6. None of the SFP items in SA RM match the RM Raw Data in Health Workforce/OS
  7. Number of staff says #N/A for all in raw data Above Site (Systems) Programs
  8. Pre-Service Training Rows for SFP is not matching in RM/Raw data
  9. In-Service Training/Continuing Medical Education Systems legend items do not align in NON_SD or SFP columns
  10. Forecasting and Planning legend items do not alight with SFP columns
  11. Sourcing and Procurement legend items do not alight with SFP columns
  12. Quality Assurance and Control legend items do not alight with SFP or NON SD columns
  13. Risk Management, Logistics Management, Warehousing and Inventory Management, Transport and Distribution, Waste Management and Return legend items do not align with SFP/NON SD columns
  14. Data Systems, Monitoring and EvaluationSurveys and Surveillance, HIV Population-based survey (e.g., PHIA), KP Demographic Surveys (e.g., IBBS) legend items do not align with SFP/NON SD columns
  15. All Laboratory Systems rows legend items do not align with SFP/NON SD columns
  16. Health Financing, Governance and Policy, Institutional and Organizational Development, Site Level Quality Management, Other Systems Support legend items do not align with SFP/NON SD columns Program Management
  17. At the Implementation Level (Implementing Partners)/ At the Donor Level legend items do not align with SFP column


  1. Repeat appears intentional. For example, AW: Salary and Benefits - Row 41 has SD_Host_Govt as primary Row 122 has Non_SD_Host_Govt as primary, and row 203 has neither as primary but SFP_Host_Govt as nominal.

  2. and 4. Area greyed out in RM, appears intentionally blank

  3. Updated code. Should fix remaining numbers except for #7. Please review new raw data file in QC folder or SharePoint named SA_RM_Raw_fixes

  4. Still working on solution

KateWilkinsMPH commented 2 years ago

Questions missing from the file "cw_a_KW" that need to be added: • A.2.3: User fees for HIV Services (added in 2019, included in 2021) • A.2.4: User fees for other health services (added in 2019, included in 2021) • A.2.12: Innovation regulation (added in 2021) • A4.4: Supply Chain (added in 2021) • A.4.6: Private sector engagement (added in 2021)

KateWilkinsMPH commented 2 years ago

Questions missing from the file "cw_a_KW" that need to be added: • A.2.3: User fees for HIV Services (added in 2019, included in 2021) • A.2.4: User fees for other health services (added in 2019, included in 2021) • A.2.12: Innovation regulation (added in 2021) • A4.4: Supply Chain (added in 2021) • A.4.6: Private sector engagement (added in 2021)


KateWilkinsMPH commented 2 years ago

2017: A.2.5.a.iv, 2019: A.2.7.a.iv, 2021: A.2.7.a.iv - "Programs to address workplace violence" 2017: A.2.5.a.iii, 2019: A.2.7.a.iii, 20121: A.2.7.a.ii - "Programs to address intimate partner violence"

Were transposed in the cw_a_KW crosswalk. Has been updated.

crgibs commented 2 years ago

Eswatini 2019 Extract & 2019 SID

Eswatini 2017 Extract & 2017 SID

Eswatini 2021 Extract & 2021 SID

KateWilkinsMPH commented 2 years ago

Eswatini 2019 Extract & 2019 SID

  • Unsure if it matters. In sid numbers are rounded, in extract it's not
  • BUID_374 The quality management score is 7.09523809 in the extraction but it's 6.43 in the SID
  • When fiscal year is filtered to 2019 Domain D isn't an option/disappears

Eswatini 2017 Extract & 2017 SID

  • When fiscal year is filtered to 2017 Domain D isn't an option/disappears

Eswatini 2021 Extract & 2021 SID

Botswana: Similar issue with Botswana - Domain D disappears for both 2017 and 2019. Domains A, B, and C are not collected for 2021

  • 14.1 is missing from data extract

Same issue with Botswana

  • In the data extract, when filtered for domain D/Fiscal year 2021, the SID short questions/answers don't match with SIDUID. For example, 2021_SIDUID column would say D.14.2.A but the same row in SIDshortquestion_score would sayd 13.1 Score:

This is because 14.2 was 13.1 in 2017. Because it ends up on the same line and in the same column, R can only assign one value to that cell.

KateWilkinsMPH commented 2 years ago

Eswatini 2019 Extract & 2019 SID

  • BUID_374 The quality management score is 7.09523809 in the extraction but it's 6.43 in the SID Reviewed SID score for Eswatini for Element 9: Quality Management score, and the data extract is correct. See screenshots below.

Quality Management Score SID Quality Management Score Extract

Looks like this is reporting correctly. Resolved

crgibs commented 2 years ago

Kenya 2017 Extract & 2017 SID

LaChandraS commented 2 years ago


Issues I had with Domain D have been addressed in latest extract. No other issues.

KateWilkinsMPH commented 2 years ago

Botswana: Domain D issue fixed. Two responses were transposed in Tab A. Has been resolved in CW. Please use this one for data extraction. No other issues. https://pepfar.sharepoint.com/:x:/s/ICPI/EXMDQuH9JrtBo7BoleCaSyoBV9aFP4iqhKZwOn3mnsPVsg?e=ychWm6

KateWilkinsMPH commented 2 years ago

Kenya 2017 Extract & 2017 SID

  • When fiscal year is filtered to 2017 Domain D isn't an option/disappears Kenya 2019 Extract & 2019 SID
  • When fiscal year is filtered to 2017 Domain D isn't an option/disappears Kenya 2021 Extract & 2021 SID
  • 14.1 Missing from extract


MadeleineBG commented 2 years ago

### DISREGARD: these UUIDs are for 2017 and are correct (Lesotho SID 2021, it looks like some responses got flipped (listed by UUID then (2021_SIDUID): AUID_225 (A.5.1.A) and AUID_228 (A.5.2.A) dataset says 3 instead of 2 in SID 2021 AUID_232 (A.5.3.A) and AUID_235 (A.5.4.A) dataset says 2 instead of 3 in SID 2021 Errors: AUID_245 (no 2021 SIDUID) says 6 in dataset instead of 5.78 in SID 2021)

KateWilkinsMPH commented 2 years ago

### DISREGARD: these UUIDs are for 2017 and are correct (Lesotho SID 2021, it looks like some responses got flipped (listed by UUID then (2021_SIDUID): AUID_225 (A.5.1.A) and AUID_228 (A.5.2.A) dataset says 3 instead of 2 in SID 2021 AUID_232 (A.5.3.A) and AUID_235 (A.5.4.A) dataset says 2 instead of 3 in SID 2021 Errors: AUID_245 (no 2021 SIDUID) says 6 in dataset instead of 5.78 in SID 2021)

Hi @MadeleineBG - it looks like you may have been looking at the wrong year. The UIDs are not the same year over year. Therefore the ones you have listed above are for 2017 only. I reviewed the SID tool for 2017, and the data extract is reporting correctly for all the UIDs you listed above.


MadeleineBG commented 2 years ago

Lesotho Not major but noticed BUID_492/ 2021_SIDUID B.7.7.B SIDresponse in dataset says "SUB" but response in SID is blank (cell I89 on worksheet B). It's the first line of B response option. The B response IS worth points alone (0.25 unweighted) and has additional suboptions.

KateWilkinsMPH commented 2 years ago

Lesotho Not major but noticed BUID_492/ 2021_SIDUID B.7.7.B SIDresponse in dataset says "SUB" but response in SID is blank (cell I89 on worksheet B). It's the first line of B response option. The B response IS worth points alone (0.25 unweighted) and has additional suboptions.

Hi Madeleine - the SUB you're seeing was added into the SIDs by us so R can extract lines that would otherwise have no data. Resolved.