HakaiInstitute / hakai-datasets

Hakai Datasets that are going into https://catalogue.hakai.org/erddap/
0 stars 0 forks source link

Add Chlorophyll Datasets to Production #58

Closed JessyBarrette closed 1 year ago

JessyBarrette commented 2 years ago

Final Submission

This is the final approval of the Chlorophyll Sample Datasets. You can revise the resulting aspect here:

Provisional

This is related to issue #47

JessyBarrette commented 2 years ago

@jdelbel please revise the complete datasets and their respective metadata record to confirm everything is ready to be published.

:)

JessyBarrette commented 2 years ago

Metadata from CKAN and ERDDAP was approved by @jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

jdelbel commented 2 years ago

Sounds great. Thanks for the help this morning. I will review the data early next week.

On Fri, 29 Oct 2021 at 13:45, Jessy Barrette @.***> wrote:

Metadata from CKAN and ERDDAP was approved by @jdelbel https://github.com/jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-955037863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44VCSUCMOFHWUWSUOETUJMBYLANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

jdelbel commented 2 years ago

For now, data included in the research dataset should be limited to the below stations.

On Fri, 29 Oct 2021 at 13:48, Justin Belluz @.***> wrote:

Sounds great. Thanks for the help this morning. I will review the data early next week.

On Fri, 29 Oct 2021 at 13:45, Jessy Barrette @.***> wrote:

Metadata from CKAN and ERDDAP was approved by @jdelbel https://github.com/jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-955037863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44VCSUCMOFHWUWSUOETUJMBYLANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

jdelbel commented 2 years ago

I found one instance (but, it could happen again) on the portal and erddap, where values were duplicated for the same station/depth/timestamp -> 2018-11-28T09:58:39Z, QU39, 10m.

This duplication occurs due to an issue with solo transducer depth matches

This error is a QC issue that needs to go back to the field crew to fix in the forms. However, should the erddap query also have something built in to remove these?

Jessy, I remember you saying that you were averaging replicate values for erddap. Is this correct and, if so, why weren't these just averaged?

On Wed, 3 Nov 2021 at 11:53, Justin Belluz @.***> wrote:

For now, data included in the research dataset should be limited to the below stations.

  • BU4

  • DFO2

  • DFO5

  • FZH01

  • KC10

  • KN3

  • KWY01

  • PRUTH

  • QCS01

  • QU29

  • QU38

  • QU39

  • QU43

  • TO2

  • TO5

On Fri, 29 Oct 2021 at 13:48, Justin Belluz @.***> wrote:

Sounds great. Thanks for the help this morning. I will review the data early next week.

On Fri, 29 Oct 2021 at 13:45, Jessy Barrette @.***> wrote:

Metadata from CKAN and ERDDAP was approved by @jdelbel https://github.com/jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-955037863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44VCSUCMOFHWUWSUOETUJMBYLANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

jdelbel commented 2 years ago

The erddap time column on the research dataset shows a 7 hour offset from the portal collected column. Is that a conversion error or does erddap require a different time zone for submission?

On Mon, 8 Nov 2021 at 11:30, Justin Belluz @.***> wrote:

I found one instance (but, it could happen again) on the portal and erddap, where values were duplicated for the same station/depth/timestamp -> 2018-11-28T09:58:39Z, QU39, 10m.

This duplication occurs due to an issue with solo transducer depth matches

  • the only differences between the records are the transducer depth.

This error is a QC issue that needs to go back to the field crew to fix in the forms. However, should the erddap query also have something built in to remove these?

Jessy, I remember you saying that you were averaging replicate values for erddap. Is this correct and, if so, why weren't these just averaged?

On Wed, 3 Nov 2021 at 11:53, Justin Belluz @.***> wrote:

For now, data included in the research dataset should be limited to the below stations.

  • BU4

  • DFO2

  • DFO5

  • FZH01

  • KC10

  • KN3

  • KWY01

  • PRUTH

  • QCS01

  • QU29

  • QU38

  • QU39

  • QU43

  • TO2

  • TO5

On Fri, 29 Oct 2021 at 13:48, Justin Belluz @.***> wrote:

Sounds great. Thanks for the help this morning. I will review the data early next week.

On Fri, 29 Oct 2021 at 13:45, Jessy Barrette @.***> wrote:

Metadata from CKAN and ERDDAP was approved by @jdelbel https://github.com/jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-955037863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44VCSUCMOFHWUWSUOETUJMBYLANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

JessyBarrette commented 2 years ago

ERDDAP output in UTC timezone. Portal outputs in PST i think. Thanks I'll have a look at the other issue you mentioned.

On Mon, Nov 8, 2021 at 1:29 PM jdelbel @.***> wrote:

The erddap time column on the research dataset shows a 7 hour offset from the portal collected column. Is that a conversion error or does erddap require a different time zone for submission?

On Mon, 8 Nov 2021 at 11:30, Justin Belluz @.***> wrote:

I found one instance (but, it could happen again) on the portal and erddap, where values were duplicated for the same station/depth/timestamp -> 2018-11-28T09:58:39Z, QU39, 10m.

This duplication occurs due to an issue with solo transducer depth matches

  • the only differences between the records are the transducer depth.

This error is a QC issue that needs to go back to the field crew to fix in the forms. However, should the erddap query also have something built in to remove these?

Jessy, I remember you saying that you were averaging replicate values for erddap. Is this correct and, if so, why weren't these just averaged?

On Wed, 3 Nov 2021 at 11:53, Justin Belluz @.***> wrote:

For now, data included in the research dataset should be limited to the below stations.

  • BU4

  • DFO2

  • DFO5

  • FZH01

  • KC10

  • KN3

  • KWY01

  • PRUTH

  • QCS01

  • QU29

  • QU38

  • QU39

  • QU43

  • TO2

  • TO5

On Fri, 29 Oct 2021 at 13:48, Justin Belluz @.***> wrote:

Sounds great. Thanks for the help this morning. I will review the data early next week.

On Fri, 29 Oct 2021 at 13:45, Jessy Barrette @.***> wrote:

Metadata from CKAN and ERDDAP was approved by @jdelbel https://github.com/jdelbel. Few issues where however identified within the transformation from the Hakai Database table to ERDDAP. Everything should be sorted one the view will get rebuild.

Hopefully, we should be able to make a final next week!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-955037863 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJPW44VCSUCMOFHWUWSUOETUJMBYLANCNFSM5FFE22HQ

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-963591748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHICYOLCB66LBS3P3ZUVIQLULA6KFANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jessy Barrette M.Sc. Marine Instrumentation Specialist Hakai Institute https://www.hakai.org/ | @.*** | (C) (250) 208-7806

JessyBarrette commented 2 years ago

Issue

I found that the duplicated records you found @jdelbel are related to duplicated records within the database with the same hakai_id but a different pressure_transducer_depth. See below all the records associated with this specific drop ( line_out_depth= 10m, site_id= QU39, collected= 2018-11-28 09:58:39.000:

<!DOCTYPE html>

SELECT * FROM eims.output_chlorophyll
WHERE(collected > '2018-11-28'
AND collected < '2018-11-29'
AND site_id in ('QU39')
AND line_out_depth IN (10)
)
actionevent_pkrndatework_areaorganizationprojectsurveysampling_boutsite_idlatlonggather_latgather_longcollection_methodline_out_depthpressure_transducer_depthvolumecollectedpreservedanalyzedlab_technicianproject_specific_idhakai_idis_blankis_solid_standardfilter_size_mmfilter_typeacetone_volume_mlflurometer_serial_nocalibrationacid_ratio_correction_factoracid_coefficientcalibration_slopebefore_acidafter_acidacid_flagdilution_factorchlachla_flagchla_finalphaeophaeo_flagphaeo_finalanalyzing_labrow_flagquality_levelcommentsquality_log
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1010.74890832502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:22:00.000bryn.fedje QCHL9023   20um107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417044,434.563,242.87 10.0673838681AV0.06738386810.0850519398AV0.0850519398HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1011.18014372502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:22:00.000bryn.fedje QCHL9023   20um107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417044,434.563,242.87 10.0673838681AV0.06738386810.0850519398AV0.0850519398HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1010.74890832502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:46:00.000bryn.fedje QCHL9024   3um107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417043,077.112,314.89 10.0430995745AV0.04309957450.0657151875AV0.0657151875HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1011.18014372502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:46:00.000bryn.fedje QCHL9024   3um107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417043,077.112,314.89 10.0430995745AV0.04309957450.0657151875AV0.0657151875HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1011.18014372502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:45:00.000bryn.fedje QCHL9025   GF/F107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417046,302.764,855.31 10.0818457651AV0.08184576510.1463851236AV0.1463851236HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1010.74890832502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:45:00.000bryn.fedje QCHL9025   GF/F107200009822018-05-04 00:00:00.0002.20291567661.83131346560.0006417046,302.764,855.31 10.0818457651AV0.08184576510.1463851236AV0.1463851236HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1010.74890832502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:23:00.000bryn.fedje QCHL9026   Bulk GF/F107200009822018-05-04 00:00:00.0002.20291567661.83131346560.00064170415,251.2111,505.66 10.211791361AV0.2117913610.3290488751AV0.3290488751HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM
448,97512018-11-28QUADRAHAKAIOCEANOGRAPHYQOMA12QU3950.0307-125.099250.029917-125.09928802 1011.18014372502018-11-28 09:58:39.0002018-11-28 14:59:00.0002018-12-04 10:23:00.000bryn.fedje QCHL9026   Bulk GF/F107200009822018-05-04 00:00:00.0002.20291567661.83131346560.00064170415,251.2111,505.66 10.211791361AV0.2117913610.3290488751AV0.3290488751HAKAIResultsTechnicianmr1: CHL metadata QCd by BF
2: Event metadata QC'd by EM

I'm not sure why this record has two transducer depths.

How many records are like that

The good news is that there isn't a lot of records with this issue in the database. Here's the full list of hakai_ids with duplicated entries. You should probably want to have a look at them.

<!DOCTYPE html>

SELECT project,site_id,collected,hakai_id ,COUNT(*) FROM eims.output_chlorophyll
GROUP BY project,site_id,collected,hakai_id
HAVING COUNT(*) > 1
projectsite_idcollectedhakai_idcount
OCEANOGRAPHYKC102017-08-14 13:21:39.000CHL83722
OCEANOGRAPHYSEA92017-07-20 12:03:35.000CHL79212
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CM11513
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82553
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82483
OCEANOGRAPHYQU392018-11-28 09:58:39.000QCHL90242
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82513
OCEANOGRAPHYNBCH012017-07-20 15:24:01.000CHL79072
OCEANOGRAPHY,OCEANOGRAPHYPRUTH2014-05-10 09:00:37.000CM11002
OCEANOGRAPHYQU452018-11-28 12:04:10.000QCHL90372
OCEANOGRAPHYKC102017-08-14 13:21:39.000CHL83732
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82493
OCEANOGRAPHYQU392018-11-28 09:58:39.000QCHL90252
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82503
OCEANOGRAPHYNBCH012017-07-20 15:24:01.000CHL79082
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CM11503
OCEANOGRAPHYQU392018-11-28 09:58:39.000QCHL90232
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CM11493
OCEANOGRAPHYKC102017-08-14 13:21:39.000CHL83702
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82543
OCEANOGRAPHYKC102017-08-14 13:21:39.000CHL83712
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82523
OCEANOGRAPHY,OCEANOGRAPHYPRUTH2014-05-10 09:00:37.000CM11012
OCEANOGRAPHYNBCH012017-07-20 15:24:01.000CHL79092
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CHL82533
OCEANOGRAPHYQCS012017-07-28 16:12:37.000CM11483
OCEANOGRAPHYMANLEYPT2018-07-18 10:17:32.000CHL105992
OCEANOGRAPHYQU392018-11-28 09:58:39.000QCHL90262

Solution

You can certainly look at them one by one. Also, we can also regroup those different records. As of now each were grouping the data by the following distinct values:

GROUP BY
(
        "work_area",
        "organization",
        "survey",
        "site_id",
        "lat",
        "long",
        "gather_lat",
        "gather_long",
        "line_out_depth",
        "pressure_transducer_depth",
        "collected"
    ) 

If multiple values are found by groups, we're selecting the first one found. We can certainly change this grouping query to eliminate potentially duplicated hakai_ids by changing it to something like that:

GROUP BY
(
        "work_area",
        "organization",
        "survey",
        "site_id",
        "lat",
        "long",
        "gather_lat",
        "gather_long",
        "line_out_depth",
        "collected",
        "hakai_id"
    ) 
jdelbel commented 2 years ago

Yeah, that lines up. I think there was only one date included in the "research" dataset - the one you highlighted, with all the others in provisional.

The overarching question is why are there two pressure transducer depths and which one is correct? I need to reach out to the crew about this - maybe Eva, as Bryn is not working very much. I am on Quadra next week and could discuss it with both of them and will send an email now.

I think it should be properly fixed on the portal rather than changing the query. That being said, it could take a while and, since only the transducer depth is different, we could just make the query change for the time being. When the issue if fixed, the new grouping work still work as there won't be any duplicated values anyways, right?

So with that grouping, if you're pulling the distinct values, it's just going to pick one of the duplicated records because, without the "pressure_transducer_depth", everything will be will be the same?

On Thu, 18 Nov 2021 at 09:19, Jessy Barrette @.***> wrote:

Issue

I found that the duplicated records you found @jdelbel https://github.com/jdelbel are related to duplicated records within the database with the same hakai_id but a different pressure_transducer_depth. See below all the records associated with this specific drop ( line_out_depth = 10m, site_id= QU39 , collected= 2018-11-28 09:58:39.000: SELECT * FROM eims.output_chlorophyll WHERE(collected > '2018-11-28' AND collected < '2018-11-29' AND site_id in ('QU39') AND line_out_depth IN (10) ) actionevent_pkrndatework_areaorganizationprojectsurveysampling_boutsite_id latlonggather_latgather_longcollection_methodline_out_depth pressure_transducer_depthvolumecollectedpreservedanalyzedlab_technician project_specific_idhakai_idis_blankis_solid_standardfilter_size_mm filter_typeacetone_volume_mlflurometer_serial_nocalibration acid_ratio_correction_factoracid_coefficientcalibration_slopebefore_acid after_acidacid_flagdilution_factorchlachla_flagchla_finalphaeophaeo_flag phaeo_finalanalyzing_labrow_flagquality_levelcommentsquality_log 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 10.7489083 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:22:00.000 bryn.fedje QCHL9023 20um 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 4,434.56 3,242.87 1 0.0673838681 AV 0.0673838681 0.0850519398 AV 0.0850519398 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 11.1801437 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:22:00.000 bryn.fedje QCHL9023 20um 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 4,434.56 3,242.87 1 0.0673838681 AV 0.0673838681 0.0850519398 AV 0.0850519398 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 10.7489083 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:46:00.000 bryn.fedje QCHL9024 3um 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 3,077.11 2,314.89 1 0.0430995745 AV 0.0430995745 0.0657151875 AV 0.0657151875 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 11.1801437 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:46:00.000 bryn.fedje QCHL9024 3um 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 3,077.11 2,314.89 1 0.0430995745 AV 0.0430995745 0.0657151875 AV 0.0657151875 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 11.1801437 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:45:00.000 bryn.fedje QCHL9025 GF/F 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 6,302.76 4,855.31 1 0.0818457651 AV 0.0818457651 0.1463851236 AV 0.1463851236 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 10.7489083 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:45:00.000 bryn.fedje QCHL9025 GF/F 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 6,302.76 4,855.31 1 0.0818457651 AV 0.0818457651 0.1463851236 AV 0.1463851236 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 10.7489083 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:23:00.000 bryn.fedje QCHL9026 Bulk GF/F 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 15,251.21 11,505.66 1 0.211791361 AV 0.211791361 0.3290488751 AV 0.3290488751 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM 448,975 1 2018-11-28 QUADRA HAKAI OCEANOGRAPHY QOMA1 2 QU39 50.0307 -125.0992 50.029917 -125.09928802 10 11.1801437 250 2018-11-28 09:58:39.000 2018-11-28 14:59:00.000 2018-12-04 10:23:00.000 bryn.fedje QCHL9026 Bulk GF/F 10 720000982 2018-05-04 00:00:00.000 2.2029156766 1.8313134656 0.000641704 15,251.21 11,505.66 1 0.211791361 AV 0.211791361 0.3290488751 AV 0.3290488751 HAKAI Results Technicianmr 1: CHL metadata QCd by BF 2: Event metadata QC'd by EM

I'm not sure why this record has two transducer depths. How many records are like that

The good news is that there isn't a lot of records with this issue in the database. Here's the full list of hakai_ids with duplicated entries. You should probably want to have a look at them. SELECT project,site_id,collected,hakai_id ,COUNT() FROM eims.output_chlorophyll GROUP BY project,site_id,collected,hakai_id HAVING COUNT() > 1 projectsite_idcollectedhakai_idcount OCEANOGRAPHY KC10 2017-08-14 13:21:39.000 CHL8372 2 OCEANOGRAPHY SEA9 2017-07-20 12:03:35.000 CHL7921 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CM1151 3 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8255 3 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8248 3 OCEANOGRAPHY QU39 2018-11-28 09:58:39.000 QCHL9024 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8251 3 OCEANOGRAPHY NBCH01 2017-07-20 15:24:01.000 CHL7907 2 OCEANOGRAPHY,OCEANOGRAPHY PRUTH 2014-05-10 09:00:37.000 CM1100 2 OCEANOGRAPHY QU45 2018-11-28 12:04:10.000 QCHL9037 2 OCEANOGRAPHY KC10 2017-08-14 13:21:39.000 CHL8373 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8249 3 OCEANOGRAPHY QU39 2018-11-28 09:58:39.000 QCHL9025 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8250 3 OCEANOGRAPHY NBCH01 2017-07-20 15:24:01.000 CHL7908 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CM1150 3 OCEANOGRAPHY QU39 2018-11-28 09:58:39.000 QCHL9023 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CM1149 3 OCEANOGRAPHY KC10 2017-08-14 13:21:39.000 CHL8370 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8254 3 OCEANOGRAPHY KC10 2017-08-14 13:21:39.000 CHL8371 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8252 3 OCEANOGRAPHY,OCEANOGRAPHY PRUTH 2014-05-10 09:00:37.000 CM1101 2 OCEANOGRAPHY NBCH01 2017-07-20 15:24:01.000 CHL7909 2 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CHL8253 3 OCEANOGRAPHY QCS01 2017-07-28 16:12:37.000 CM1148 3 OCEANOGRAPHY MANLEYPT 2018-07-18 10:17:32.000 CHL10599 2 OCEANOGRAPHY QU39 2018-11-28 09:58:39.000 QCHL9026 2 Solution

You can certainly look at them one by one. Also, we can also regroup those different records. As of now each were grouping the data by the following distinct values:

GROUP BY

(

    "work_area",

    "organization",

    "survey",

    "site_id",

    "lat",

    "long",

    "gather_lat",

    "gather_long",

    "line_out_depth",

    "pressure_transducer_depth",

    "collected"

)

If multiple values are found by groups, we're selecting the first one found. We can certainly change this grouping query to eliminate potentially duplicated hakai_ids by changing it to something like that:

GROUP BY

(

    "work_area",

    "organization",

    "survey",

    "site_id",

    "lat",

    "long",

    "gather_lat",

    "gather_long",

    "line_out_depth",

    "collected",

    "hakai_id"

)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-973082346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44QTZXXZDTLOACQUABDUMUYRXANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

JessyBarrette commented 2 years ago

Looks like the different pressure_transducer_depth is only an issue with the hakai_ids :

QCHL9023
QCHL9023
QCHL9024
QCHL9024
QCHL9025
QCHL9025
QCHL9026
QCHL9026

I can't really see quickly any other differences with the others. I can certainly group by hakai_id that will resolve all the issues. I think Matt Foster would be helpful to figure out why there isn't a single hakai_id value within those tables.

jdelbel commented 2 years ago

Your original list includes replicates with either one or two records without a pressure_transducer_depth and one with a pressure_transducer_depth. Your second list just has replicates where there are two slightly different pressure_transducer_depths. I suppose the second issue is more problematic (i.e. which depth is correct); however, with the first situation, it would be important to pick the record that has a depth included (not the one or two where it is blank).

I emailed both Matt and Bryn about it - hopefully we can just get it fixed.

On Fri, 19 Nov 2021 at 10:39, Jessy Barrette @.***> wrote:

Looks like the different pressure_transducer_depth is only an issue with the hakai_ids :

QCHL9023 QCHL9023 QCHL9024 QCHL9024 QCHL9025 QCHL9025 QCHL9026 QCHL9026

I can't really see quickly any other differences with the others. I can certainly group by hakai_id that will resolve all the issues. I think Matt Foster would be helpful to figure out why there isn't a single hakai_id value within those tables.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-974314767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44SUMUZ2U44WIJH5PHTUM2KXLANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

JessyBarrette commented 2 years ago

ok, must be something related to the query used to generate the view eims.output_chloropyll. Yeah Matt, would be able to help on that.

jdelbel commented 2 years ago

I am waiting to hear a response about replicates from Bryn.

For the research dataset, I think I should flag data that does not include all three filter sizes (even though there might be 2 of 3 filters with AV flags). In querying samples that do not have all three filters, I found a discrepancy of 16 filters that have concentrations and AV flags on the portal, but are not showing up on the erddap download. These shouldn't matter when I flag data without complete filter sets, but I it is strange that the erddap query is not finding them. Any ideas? Here's the list:

QCHL8287 QCHL8364 QCHL8481 CHL10699 CHL10700 CHL10788 CHL10789 CHL10943 CHL10932 QCHL9069 QCHL9070 QCHL9805 QCHL10430 QCHL10431 QCHL10434 QCHL10435

On Mon, 22 Nov 2021 at 11:23, Jessy Barrette @.***> wrote:

ok, must be something related to the query used to generate the view eims.output_chloropyll. Yeah Matt, would be able to help on that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HakaiInstitute/hakai-datasets/pull/58#issuecomment-975845927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJPW44SBS6MIJ3FQIJXQMCDUNKKDBANCNFSM5FFE22HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Justin Del Bel Belluz, MSc. Research Technician - Bio-Optical Oceanography Hakai Institute 100 - 1002 Wharf Street Victoria, BC Canada V8W 1T4 www.hakai.org

JessyBarrette commented 1 year ago

This PR is too old we well recreate it from a the new development branch