emo-bon / governance-data

Holds the governance content for the emo-bon data management
0 stars 0 forks source link

source_mat_ids from Batch 1 & 2 run_information tables that do not match source_mat_ids in the observatory logsheets #25

Closed cymon closed 2 months ago

cymon commented 2 months ago

Missing source_mat_ids - or how the source_mat_ids in the Batch run information sheets do not match the source_mat_ids in the Google logsheets

These are the emo-bon source_mat_ids (the primary identifier) from the Batch 1 and 2 run_information sheets that are missing from the Google logsheets. The leftmost code is the one found in the Batch 1 and 2 run_information sheets which are presumably the correct ones as they are manually curated by Ioulia. The next three are close matches using the python difflib.get_close_matches() function.

Most have obvious matches where 200um has been changed to 0.2um, but some are less clear

['EMOBON_BPNS_Wa_210701_0.2um_2', ['EMOBON_BPNS_Wa_210701_200um_2', 'EMOBON_BPNS_Wa_210701_3um_2', 'EMOBON_BPNS_Wa_210701_NAum_2']]
['EMOBON_BPNS_Wa_210701_0.2um_1', ['EMOBON_BPNS_Wa_210701_200um_1', 'EMOBON_BPNS_Wa_210701_3um_1', 'EMOBON_BPNS_Wa_210701_NAum_1']]
['EMOBON_BPNS_Wa_210825_0.2um_1', ['EMOBON_BPNS_Wa_210825_200um_1', 'EMOBON_BPNS_Wa_210825_3um_1', 'EMOBON_BPNS_Wa_210825_NAum_1']]
['EMOBON_BPNS_Wa_210825_0.2um_2', ['EMOBON_BPNS_Wa_210825_200um_2', 'EMOBON_BPNS_Wa_210825_3um_2', 'EMOBON_BPNS_Wa_210825_NAum_2']]
['EMOBON_ROSKOGO_Wa_210618_0.2um_1', ['EMOBON_ROSKOGO_Wa_210618_200um_1', 'EMOBON_ROSKOGO_Wa_210618_200um_1', 'EMOBON_ROSKOGO_Wa_210618_3um_1']]
['EMOBON_ROSKOGO_Wa_210618_0.2um_2', ['EMOBON_ROSKOGO_Wa_210618_200um_2', 'EMOBON_ROSKOGO_Wa_210618_3um_2', 'EMOBON_ROSKOGO_Wa_210618_200um_4']]
['EMOBON_ROSKOGO_Wa_210802_0.2um_1', ['EMOBON_ROSKOGO_Wa_210802_200um_1', 'EMOBON_ROSKOGO_Wa_210802_3um_1', 'EMOBON_ROSKOGO_Wa_210802_200um_4']]
['EMOBON_ROSKOGO_Wa_210802_0.2um_2', ['EMOBON_ROSKOGO_Wa_210802_200um_2', 'EMOBON_ROSKOGO_Wa_210802_3um_2', 'EMOBON_ROSKOGO_Wa_210802_200um_4']]
['EMOBON_VB_Wa_210621_0.2um_1', ['EMOBON_VB_Wa_210621_3um_1', 'EMOBON_VB_Wa_210621_NAum_1', 'EMOBON_VB_Wa_210621_NAum_1']]
['EMOBON_VB_Wa_210621_0.2um_2', ['EMOBON_VB_Wa_210621_3um_2', 'EMOBON_VB_Wa_210621_NAum_2', 'EMOBON_VB_Wa_210621_NAum_2']]
['EMOBON_VB_Wa_210823_0.2um_1', ['EMOBON_VB_Wa_210823_3um_1', 'EMOBON_VB_Wa_210823_NAum_1', 'EMOBON_VB_Wa_210823_NAum_1']]
['EMOBON_VB_Wa_210823_0.2um_2', ['EMOBON_VB_Wa_210823_3um_2', 'EMOBON_VB_Wa_210823_NAum_2', 'EMOBON_VB_Wa_210823_NAum_2']]
['EMOBON_EMT21_Wa_210825_0.2um_1', ['EMOBON_EMT21_Wa_210825_200um_1', 'EMOBON_EMT21_Wa_210825_3um_1', 'EMOBON_EMT21_Wa_210825_NAum_1']]
['EMOBON_EMT21_Wa_210825_0.2um_2', ['EMOBON_EMT21_Wa_210825_200um_2', 'EMOBON_EMT21_Wa_210825_3um_2', 'EMOBON_EMT21_Wa_210825_NAum_2']]
['EMOBON_PiEGetxo_Wa_210824_0.2um_1', ['EMOBON_PiEGetxo_Wa_210824_200um_1', 'EMOBON_PiEGetxo_Wa_210824_3um_1', 'EMOBON_PiEGetxo_Wa_221024_200um_1']]
['EMOBON_PiEGetxo_Wa_210824_0.2um_2', ['EMOBON_PiEGetxo_Wa_210824_200um_2', 'EMOBON_PiEGetxo_Wa_210824_3um_2', 'EMOBON_PiEGetxo_Wa_221024_200um_2']]
['EMOBON_PiEGetxo_Wa_210824_0.2um_blank', ['EMOBON_PiEGetxo_Wa_210824_200um_blank', 'EMOBON_PiEGetxo_Wa_210824_200um_blank', 'EMOBON_PiEGetxo_Wa_210824_3um_blank']]
['EMOBON_RFormosa_Wa_210805_0.2um_1', ['EMOBON_RFormosa_Wa_210805_200um_1', 'EMOBON_RFormosa_Wa_210805_3um_1', 'EMOBON_RFormosa_Wa_210805_200um_4']]
['EMOBON_RFormosa_Wa_210805_0.2um_2', ['EMOBON_RFormosa_Wa_210805_200um_2', 'EMOBON_RFormosa_Wa_210805_3um_2', 'EMOBON_RFormosa_Wa_210805_200um_4']]
['EMOBON_OSD74_Wa_210831_0.2um_1', ['EMOBON_OSD74_Wa_210831_200um_1', 'EMOBON_OSD74_Wa_210831_3um_1', 'EMOBON_OSD74_Wa_210831_200um_4']]
['EMOBON_OSD74_Wa_210831_0.2um_2', ['EMOBON_OSD74_Wa_210831_200um_2', 'EMOBON_OSD74_Wa_210831_3um_2', 'EMOBON_OSD74_Wa_210831_200um_4']]
['EMOBON_AAOT_Wa_210622_0.2um_1', ['EMOBON_AAOT_Wa_210622_200um_1', 'EMOBON_AAOT_Wa_210622_3um_1', 'EMOBON_AAOT_Wa_210622_2000um_1']]
['EMOBON_AAOT_Wa_210622_0.2um_2', ['EMOBON_AAOT_Wa_210622_200um_2', 'EMOBON_AAOT_Wa_210622_3um_2', 'EMOBON_AAOT_Wa_210622_2000um_2']]
['EMOBON_AAOT_Wa_210809_0.2um_1', ['EMOBON_AAOT_Wa_210809_200um_1', 'EMOBON_AAOT_Wa_210809_3um_1', 'EMOBON_AAOT_Wa_210809_2000um_1']]
['EMOBON_AAOT_Wa_210809_0.2um_2', ['EMOBON_AAOT_Wa_210809_200um_2', 'EMOBON_AAOT_Wa_210809_3um_2', 'EMOBON_AAOT_Wa_210809_2000um_2']]
['EMOBON_NRMCB_Wa_210621_0.2um_1', ['EMOBON_NRMCB_Wa_210621_200um_1', 'EMOBON_NRMCB_Wa_210621_3um_1', 'EMOBON_NRMCB_Wa_210621_NAum_1']]
['EMOBON_NRMCB_Wa_210621_0.2um_2', ['EMOBON_NRMCB_Wa_210621_200um_2', 'EMOBON_NRMCB_Wa_210621_3um_2', 'EMOBON_NRMCB_Wa_210621_NAum_2']]
['EMOBON_NRMCB_Wa_210831_0.2um_1', ['EMOBON_NRMCB_Wa_210831_3um_1', 'EMOBON_NRMCB_Wa_210831_NAum_1', 'EMOBON_NRMCB_Wa_210831_NAum_1']]
['EMOBON_NRMCB_Wa_210831_0.2um_2', ['EMOBON_NRMCB_Wa_210831_3um_2', 'EMOBON_NRMCB_Wa_210831_NAum_2', 'EMOBON_NRMCB_Wa_210831_NAum_2']]
['EMOBON_NRMCB_Wa_210831_0.2um_blank', ['EMOBON_NRMCB_Wa_210831_3um_blank', 'EMOBON_NRMCB_Wa_210831_NAum_blank', 'EMOBON_NRMCB_Wa_210831_NAum_blank']]
['EMOBON_HCMR-1_Wa_210628_0.2um_1', ['EMOBON_HCMR-1_Wa_210628_200um_1', 'EMOBON_HCMR-1_Wa_210628_3um_1', 'EMOBON_HCMR-1_Wa_210628_200um_4']]
['EMOBON_HCMR-1_Wa_210628_0.2um_2', ['EMOBON_HCMR-1_Wa_210628_200um_2', 'EMOBON_HCMR-1_Wa_210628_3um_2', 'EMOBON_HCMR-1_Wa_210628_200um_4']]
['EMOBON_HCMR-1_Wa_210917_3um_blank', ['EMOBON_HCMR-1_Wa_210917_3um_blank2', 'EMOBON_HCMR-1_Wa_210917_3um_blank1', 'EMOBON_HCMR-1_Wa_210917_200um_blank2']]
['EMOBON_HCMR-1_Wa_210917_0.2um_1', ['EMOBON_HCMR-1_Wa_210917_200um_1', 'EMOBON_HCMR-1_Wa_210917_3um_1', 'EMOBON_HCMR-1_Wa_210917_200um_4']]
['EMOBON_HCMR-1_Wa_210917_0.2um_2', ['EMOBON_HCMR-1_Wa_210917_200um_2', 'EMOBON_HCMR-1_Wa_210917_3um_2', 'EMOBON_HCMR-1_Wa_210917_200um_4']]
['EMOBON_HCMR-1_Wa_210917_0.2um_blank', ['EMOBON_HCMR-1_Wa_210917_200um_blank2', 'EMOBON_HCMR-1_Wa_210917_200um_blank1', 'EMOBON_HCMR-1_Wa_210917_3um_blank2']]
['EMOBON_IUIEilat1_Wa_210829_0.2um_1', ['EMOBON_IUIEilat1_Wa_210829_200um_1', 'EMOBON_IUIEilat1_Wa_210829_3um_1', 'EMOBON_IUIEilat1_Wa_210829_200um_4']]
['EMOBON_IUIEilat1_Wa_210829_0.2um_2', ['EMOBON_IUIEilat1_Wa_210829_200um_2', 'EMOBON_IUIEilat1_Wa_210829_3um_2', 'EMOBON_IUIEilat1_Wa_210829_200um_4']]
['EMOBON_IUIEilat1_Wa_211018_0.2um_1', ['EMOBON_IUIEilat1_Wa_211018_200um_1', 'EMOBON_IUIEilat1_Wa_211018_3um_1', 'EMOBON_IUIEilat1_Wa_211018_200um_4']]
['EMOBON_IUIEilat1_Wa_211018_0.2um_2', ['EMOBON_IUIEilat1_Wa_211018_200um_2', 'EMOBON_IUIEilat1_Wa_211018_3um_2', 'EMOBON_IUIEilat1_Wa_211018_200um_4']]
['EMOBON_IUIEilat1_Wa_211219_0.2um_1', ['EMOBON_IUIEilat1_Wa_211219_200um_1', 'EMOBON_IUIEilat1_Wa_211219_3um_1', 'EMOBON_IUIEilat1_Wa_211219_200um_4']]
['EMOBON_IUIEilat1_Wa_211219_0.2um_2', ['EMOBON_IUIEilat1_Wa_211219_200um_2', 'EMOBON_IUIEilat1_Wa_211219_3um_2', 'EMOBON_IUIEilat1_Wa_211219_200um_4']]
['EMOBON_IUIEilat1_Wa_211219_0.2um_blank', ['EMOBON_IUIEilat1_Wa_211219_200um_blank', 'EMOBON_IUIEilat1_Wa_211219_3um_blank', 'EMOBON_IUIEilat1_Wa_210829_200um_blank']]
['EMOBON_BPNS_Wa_211028_0.2um_1', ['EMOBON_BPNS_Wa_211028_200um_1', 'EMOBON_BPNS_Wa_211028_3um_1', 'EMOBON_BPNS_Wa_211028_NAum_1']]
['EMOBON_BPNS_Wa_211028_0.2um_2', ['EMOBON_BPNS_Wa_211028_200um_2', 'EMOBON_BPNS_Wa_211028_3um_2', 'EMOBON_BPNS_Wa_211028_NAum_2']]
['EMOBON_BPNS_Wa_211028_0.2um_blank', ['EMOBON_BPNS_Wa_211028_200um_blank', 'EMOBON_BPNS_Wa_211028_3um_blank', 'EMOBON_BPNS_Wa_211028_NAum_blank']]
['EMOBON_BPNS_Wa_211223_0.2um_1', ['EMOBON_BPNS_Wa_211223_200um_1', 'EMOBON_BPNS_Wa_211223_3um_1', 'EMOBON_BPNS_Wa_211223_NAum_1']]
['EMOBON_BPNS_Wa_211223_0.2um_2', ['EMOBON_BPNS_Wa_211223_200um_2', 'EMOBON_BPNS_Wa_211223_3um_2', 'EMOBON_BPNS_Wa_211223_NAum_2']]
['EMOBON_EMT21_Wa_211020_0.2um_1', ['EMOBON_EMT21_Wa_211020_200um_1', 'EMOBON_EMT21_Wa_211020_3um_1', 'EMOBON_EMT21_Wa_211020_NAum_1']]
['EMOBON_EMT21_Wa_211020_0.2um_2', ['EMOBON_EMT21_Wa_211020_200um_2', 'EMOBON_EMT21_Wa_211020_3um_2', 'EMOBON_EMT21_Wa_211020_NAum_2']]
['EMOBON_EMT21_Wa_211216_0.2um_1', ['EMOBON_EMT21_Wa_211216_200um_1', 'EMOBON_EMT21_Wa_211216_3um_1', 'EMOBON_EMT21_Wa_211216_NAum_1']]
['EMOBON_EMT21_Wa_211216_0.2um_2', ['EMOBON_EMT21_Wa_211216_200um_2', 'EMOBON_EMT21_Wa_211216_3um_2', 'EMOBON_EMT21_Wa_211216_NAum_2']]
['EMOBON_EMT21_Wa_211216_0.2um_blank', ['EMOBON_EMT21_Wa_211216_200um_blank', 'EMOBON_EMT21_Wa_211216_3um_blank', 'EMOBON_EMT21_Wa_211216_NAum_blank']]
['EMOBON_MBAL4_Wa_211103_0.2um_1', ['EMOBON_MBAL4_Wa_211103_200um_1', 'EMOBON_MBAL4_Wa_211103_3um_1', 'EMOBON_MBAL4_Wa_211103_200um_4']]
['EMOBON_MBAL4_Wa_211103_0.2um_2', ['EMOBON_MBAL4_Wa_211103_200um_2', 'EMOBON_MBAL4_Wa_211103_3um_2', 'EMOBON_MBAL4_Wa_211103_200um_4']]
['EMOBON_MBAL4_Wa_211216_0.2um_1', ['EMOBON_MBAL4_Wa_211216_200um_1', 'EMOBON_MBAL4_Wa_211216_3um_1', 'EMOBON_MBAL4_Wa_211216_200um_4']]
['EMOBON_MBAL4_Wa_211216_0.2um_2', ['EMOBON_MBAL4_Wa_211216_200um_2', 'EMOBON_MBAL4_Wa_211216_3um_2', 'EMOBON_MBAL4_Wa_211216_200um_4']]
['EMOBON_AAOT_Wa_211015_0.2um_1', ['EMOBON_AAOT_Wa_211015_200um_1', 'EMOBON_AAOT_Wa_211015_3um_1', 'EMOBON_AAOT_Wa_211015_2000um_1']]
['EMOBON_AAOT_Wa_211015_0.2um_2', ['EMOBON_AAOT_Wa_211015_200um_2', 'EMOBON_AAOT_Wa_211015_3um_2', 'EMOBON_AAOT_Wa_211015_2000um_2']]
['EMOBON_AAOT_Wa_211214_0.2um_1', ['EMOBON_AAOT_Wa_211214_200um_1', 'EMOBON_AAOT_Wa_211214_3um_1', 'EMOBON_AAOT_Wa_211214_2000um_1']]
['EMOBON_AAOT_Wa_211214_0.2um_2', ['EMOBON_AAOT_Wa_211214_200um_2', 'EMOBON_AAOT_Wa_211214_3um_2', 'EMOBON_AAOT_Wa_211214_2000um_2']]
['EMOBON_AAOT_Wa_211214_0.2um_blank', ['EMOBON_AAOT_Wa_211214_200um_blank', 'EMOBON_AAOT_Wa_211214_3um_blank', 'EMOBON_AAOT_Wa_211214_2000um_blank']]
['EMOBON_VB_Wa_211018_0.2um_1', ['EMOBON_VB_Wa_211018_3um_1', 'EMOBON_VB_Wa_211018_NAum_1', 'EMOBON_VB_Wa_211018_NAum_1']]
['EMOBON_VB_Wa_211018_0.2um_2', ['EMOBON_VB_Wa_211018_3um_2', 'EMOBON_VB_Wa_211018_NAum_2', 'EMOBON_VB_Wa_211018_NAum_2']]
['EMOBON_VB_Wa_211018_0.2um_blank', ['EMOBON_VB_Wa_211018_3um_blank', 'EMOBON_VB_Wa_211018_NAum_blank', 'EMOBON_VB_Wa_211018_NAum_blank']]
['EMOBON_VB_Wa_211217_0.2um_1', ['EMOBON_VB_Wa_211217_3um_1', 'EMOBON_VB_Wa_211217_NAum_1', 'EMOBON_VB_Wa_211217_NAum_1']]
['EMOBON_VB_Wa_211217_0.2um_2', ['EMOBON_VB_Wa_211217_3um_2', 'EMOBON_VB_Wa_211217_NAum_2', 'EMOBON_VB_Wa_211217_NAum_2']]
['EMOBON_ROSKOGO_Wa_211014_0.2um_1', ['EMOBON_ROSKOGO_Wa_211014_200um_1', 'EMOBON_ROSKOGO_Wa_211014_3um_1', 'EMOBON_ROSKOGO_Wa_211014_200um_4']]
['EMOBON_ROSKOGO_Wa_211014_0.2um_2', ['EMOBON_ROSKOGO_Wa_211014_200um_2', 'EMOBON_ROSKOGO_Wa_211014_3um_2', 'EMOBON_ROSKOGO_Wa_211014_200um_4']]
['EMOBON_ROSKOGO_Wa_211014_0.2um_blank', ['EMOBON_ROSKOGO_Wa_211014_um_blank', 'EMOBON_ROSKOGO_Wa_211014_3um_blank', 'EMOBON_ROSKOGO_Wa_230414_um_blank']]
['EMOBON_ROSKOGO_Wa_211213_0.2um_1', ['EMOBON_ROSKOGO_Wa_211213_200um_1', 'EMOBON_ROSKOGO_Wa_211213_3um_1', 'EMOBON_ROSKOGO_Wa_211213_200um_4']]
['EMOBON_ROSKOGO_Wa_211213_0.2um_2', ['EMOBON_ROSKOGO_Wa_211213_200um_2', 'EMOBON_ROSKOGO_Wa_211213_3um_2', 'EMOBON_ROSKOGO_Wa_211213_200um_4']]
['EMOBON_PiEGetxo_Wa_211027_0.2um_1', ['EMOBON_PiEGetxo_Wa_211027_um_1', 'EMOBON_PiEGetxo_Wa_211027_um_1', 'EMOBON_PiEGetxo_Wa_211027_um_1']]
['EMOBON_PiEGetxo_Wa_211027_0.2um_2', ['EMOBON_PiEGetxo_Wa_211027_um_2', 'EMOBON_PiEGetxo_Wa_211027_um_2', 'EMOBON_PiEGetxo_Wa_211027_um_2']]
['EMOBON_PiEGetxo_Wa_211222_0.2um_1', ['EMOBON_PiEGetxo_Wa_211222_um_1', 'EMOBON_PiEGetxo_Wa_211222_um_1', 'EMOBON_PiEGetxo_Wa_211222_200um_1']]
['EMOBON_PiEGetxo_Wa_211222_0.2um_2', ['EMOBON_PiEGetxo_Wa_211222_um_2', 'EMOBON_PiEGetxo_Wa_211222_um_2', 'EMOBON_PiEGetxo_Wa_211222_200um_2']]
['EMOBON_RFormosa_Wa_211022_0.2um_1', ['EMOBON_RFormosa_Wa_211022_200um_1', 'EMOBON_RFormosa_Wa_211022_3um_1', 'EMOBON_RFormosa_Wa_211220_200um_1']]
['EMOBON_RFormosa_Wa_211022_0.2um_2', ['EMOBON_RFormosa_Wa_211022_200um_2', 'EMOBON_RFormosa_Wa_211022_3um_2', 'EMOBON_RFormosa_Wa_211220_200um_2']]
['EMOBON_RFormosa_Wa_211220_0.2um_1', ['EMOBON_RFormosa_Wa_211220_200um_1', 'EMOBON_RFormosa_Wa_211220_3um_1', 'EMOBON_RFormosa_Wa_211220_200um_4']]
['EMOBON_RFormosa_Wa_211220_0.2um_2', ['EMOBON_RFormosa_Wa_211220_200um_2', 'EMOBON_RFormosa_Wa_211220_3um_2', 'EMOBON_RFormosa_Wa_211220_200um_4']]
['EMOBON_RFormosa_Wa_211220_0.2um_blank', ['EMOBON_RFormosa_Wa_211220_200um_blank', 'EMOBON_RFormosa_Wa_211220_3um_blank', 'EMOBON_RFormosa_Wa_211022_200um_blank']]
['EMOBON_NRMCB_Wa_211026_0.2um_1', ['EMOBON_NRMCB_Wa_211026_200um_1', 'EMOBON_NRMCB_Wa_211026_3um_1', 'EMOBON_NRMCB_Wa_211026_NAum_1']]
['EMOBON_NRMCB_Wa_211026_0.2um_2', ['EMOBON_NRMCB_Wa_211026_200um_2', 'EMOBON_NRMCB_Wa_211026_3um_2', 'EMOBON_NRMCB_Wa_211026_NAum_2']]
['EMOBON_NRMCB_Wa_211221_0.2um_1', ['EMOBON_NRMCB_Wa_211221_200um_1', 'EMOBON_NRMCB_Wa_211221_3um_1', 'EMOBON_NRMCB_Wa_211221_NAum_1']]
['EMOBON_NRMCB_Wa_211221_0.2um_2', ['EMOBON_NRMCB_Wa_211221_200um_2', 'EMOBON_NRMCB_Wa_211221_3um_2', 'EMOBON_NRMCB_Wa_211221_NAum_2']]
['EMOBON_OSD74_Wa_211028_0.2um_1', ['EMOBON_OSD74_Wa_211028_200um_1', 'EMOBON_OSD74_Wa_211028_3um_1', 'EMOBON_OSD74_Wa_211218_200um_1']]
['EMOBON_OSD74_Wa_211028_0.2um_2', ['EMOBON_OSD74_Wa_211028_200um_2', 'EMOBON_OSD74_Wa_211028_3um_2', 'EMOBON_OSD74_Wa_211218_200um_2']]
['EMOBON_OSD74_Wa_211218_0.2um_1', ['EMOBON_OSD74_Wa_211218_200um_1', 'EMOBON_OSD74_Wa_211218_3um_1', 'EMOBON_OSD74_Wa_211218_200um_4']]
['EMOBON_OSD74_Wa_211218_0.2um_2', ['EMOBON_OSD74_Wa_211218_200um_2', 'EMOBON_OSD74_Wa_211218_3um_2', 'EMOBON_OSD74_Wa_211218_200um_4']]
['EMOBON_OSD74_Wa_211218_0.2um_blank', ['EMOBON_OSD74_Wa_211218_200um_blank', 'EMOBON_OSD74_Wa_211218_3um_blank', 'EMOBON_OSD74_Wa_211028_200um_blank']]
kmexter commented 2 months ago

@cymon , can you add the link here to where you got the IDs for the first column, and explain what you mean what are in the following columns and which files they come from? that makes it easier to diagnose the problem

cymon commented 2 months ago

The first column of IDs are the emo-bon source_mat_ids (the primary identifier) from the Batch 1 and 2 run_information sheets: batch 1 batch 2

The next three are close matches using the python difflib.get_close_matches() function on the 4047 valid souce_mat_ids from the logsheets.

'Total number of all_source_mat_ids_from_sheets: 4047' (All the source_mat_ids from all sheets "sampling" and "measured" from all observatories that match the correct source_matid format, specifically, len(value.split("")) < 6) A total of 90 source_mat_ids in the batch 1 & 2 run information sheets are missing from the observatory sampling sheets (ie are not in the 4047 valid source_mat_ids in the sheets) Total combined_events 138 + 90 = 228 and should be equal to total number of refcodes assigned in the run information sheets 227 (no idea why yet)

kmexter commented 2 months ago

From the logsheets in GH in the observatories (which are old), your newer ones in GH, or from the googledrive itself?

cymon commented 2 months ago

All of the "sampling" and "measured" data were harvested directly from the observatories Google Sheets; I used the links provided in

emo-bon/governance/logsheets.csv

to identify the sheets. The so the "newer" ones I put on GH are derived directly from the observatory Google Sheets.

Can the emo-bon/observatory-{observatory_id}-crate/main/logsheets be updated? They would be a better starting point than the raw Google Sheets.

kmexter commented 2 months ago

That is for @marc-portier to solve - the pipeline that gets those does the QC and other stuff also and if it fails anywhere along the way, nothing gets harvested. Marc is looking to bypass this code until Bram can get back to fix it, so you have to ask him.

cymon commented 2 months ago

That is for @marc-portier to solve - the pipeline that gets those does the QC and other stuff also and if it fails anywhere along the way, nothing gets harvested. Marc is looking to bypass this code until Bram can get back to fix it, so you have to ask him.

Only "Bergen" appears to be missing a "transformed" data sheet, but those that are transformed may not be very up to date, as you say.

kmexter commented 2 months ago

indeed, all are old - very pre-summer-QC work

cymon commented 2 months ago

The source_mat_ids have changed in the logsheeets, so these errors are no longer relevant.