Closed bpasquer closed 1 year ago
@atcooper1 we can see the details of these observation traits with this code (change 'Notes' to the other attributes to see details): select distinct(observation_attribute->'Notes') from nrmn.observation where observation_attribute->'Notes' is not null
Most attributes relate to fine scale sizes, whether the animal (eg. lobster) was estimated for size, and its sex (also used for wrasse historically). However these seem to be duplicated, eg. 'SizeRaw' and 'SizeClassRaw' where I cannot tell the difference, perhaps the same thing from the 2 historical databases? 'Notes' mostly refer to RLS species ID changes - I'm unsure the value of these. 'Biomass' may be redundant - as this is now calculated on data-out based on the stored a's and b's??? 'DescriptiveName' is a mystery to me and I'll have to look into it further.
Moving forward the metrics related to lobster/abalone raw sizes, sex and estimates could be moved to a legacy endpoint, eg incorporated into the ep_lobster_haliotis and no longer attached to observation_ids (but still attached to survey_ids etc.).
There has been confirmation that loss of these historical data is acceptable in the database and legacy data of these can be generated and available externally. Record of this disscussion: RE_ Database corrections mechanism - potential loss of observation-level metadata.pdf
File generated to demonstrate and discuss different examples of the metadata at risk: Observation attributes json.xlsx
A version1 extract of
were shared, and Lizzie's and Toni (in italic) feedbacks were as follows:
Thank you for the extracts. For the basic format I think that we should include survey_date, depth, site_code, site_date and diver as columns.
For the legacy_observation_attributes, I think they require some cleaning.
Biomass should potentially be excluded as it is incomplete – what do you think Toni? This would allow the extract to be much shorter and focussed on notes, animal sex, and high res size categories. Agreed, get rid of biomass The various columns for sizes should be merged into a meaningful variable. Currently they are very convoluted and confusing and in some years will mean nothing to whomever is looking at the data without proper metadata descriptions.
- Size estimated and size class estimated should be merged as a singular yes/no field
- Sizeraw is a value in mm relating to invertebrate observations and should be kept as is.
- Size class estimated is a weird mix of measurements in inches for fish and cephalopods, but measurements in mm for invertebrates. The fish ones should be deleted and disregarded. The invertebrate measurements should be kept and merged with “Sizeraw” “simulated absence” is a no species found record. These should be deleted as is this not preserved/re-ingested upon corrections??
- Notes should stay as is, but may need some review / discussion with Toni as to whether to keep or separate the automated RLS ones referring to species and size corrections. Looking at the RLS notes, we think it’s probably best to remove these from the legacy data all together. They are pretty meaningless now that Species ID has changed in the NRMN DB, and we really don’t need to know that diver initials were updated with accents! Suggest removing all Notes %Added # from SpeciesID%, %SpeciesID Changed%, %Diver reference data%, and %Size data removed%.
For the survey_method attributes, I feel like it is better to convert these to survey_notes stored at the survey level as this is where they are relevant (and seem to be repeated for every element of the survey-method in the survey_id’s anyway. This way the extract is redundant, and we probably don’t need it at all. There are only 5 types of non-standard data attributes, all applicable at the survey level:
- Site sampled due to oil spill
- Method error
- Poor visibility
- Additional data
- Carried out on seagrass bed
“Method error” are for surveys from one day and seem to be explained in the survey notes anyway - but those notes can still be appended with “method error”. NB. All of these should not override existing survey_notes but be appended to them. Poor visibility seems to be subjective and probably inconsistent so I would probably exclude these unless the visibility value is missing for that survey.
For the legacy method the two attribute values are 8 and 9. These stand for the different legacy methods described below. These data were converted to the standardised fish blocks when other ATRC data were merged and block abundances were simulated. They could be added as survey note descriptions as well (eg. “Legacy method 8: Parallel fish survey 50 x 10 m blocks over 4 surveys”) - this may be useful if ever checking the original survey sheets, but the downside is that the survey notes might confuse people. Do you have an opinion, Toni? I tend to think just delete them.
| 8 | Parallel fish survey 50 x 10 m blocks | e.g. Jervis Bay 2007 4 transects: 1,2,3,4 | Total: each fish species Size: all species; estimated in inch size categories Sex: when available | | 9 | Parallel fish survey 100 x 5 m blocks | e.g. Batemans Bay 2005, 2006 4 transects: 1,2,3,4 | Total: each fish species Size: all species; estimated in inch size categories Sex: when available |
Thanks team,
The following query was used to clean and extract the observation attributes( I will share the resulting attribute list in an email)
The query :
SELECT sm.survey_id,
survey_date,
depth,
site_code,
site_name,
full_name AS diver,
block_num,
method_id,
obs.observable_item_id,
observable_item_name,
obs.measure_id,
measure_name,
obs.measure_value,
observation_attribute::JSONB->>'DescriptiveName' AS "DescriptiveName",
observation_attribute::JSONB->>'LegalSize' AS "LegalSize",
observation_attribute::JSONB->>'SpeciesSex' AS "SpeciesSex",
concat(observation_attribute::JSONB->>'SizeRaw', observation_attribute::JSONB->>'SizeClassRaw') AS size,
CASE
WHEN concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')='NoNo' THEN 'No'
WHEN concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')='YesYes' THEN 'Yes'
ELSE concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')
END AS estimated,
observation_attribute::JSONB->>'Notes' AS "Notes"
FROM observation obs
LEFT JOIN observable_item_ref oir ON oir.observable_item_id = obs.observable_item_id
LEFT JOIN measure_ref mr ON mr.measure_id = obs.measure_id
LEFT JOIN survey_method sm ON obs.survey_method_id = sm.survey_method_id
LEFT JOIN survey s ON sm.survey_id = s.survey_id
LEFT JOIN diver_ref dr ON obs.diver_id = dr.diver_id
LEFT JOIN site_ref sr ON s.site_id = sr.site_id
WHERE (observation_attribute::JSONB->>'SizeRaw' IS NOT NULL
OR observation_attribute::JSONB->>'SizeClassRaw' IS NOT NULL
OR observation_attribute::JSONB->>'SizeEstimated' IS NOT NULL
OR observation_attribute::JSONB->>'SizeClassEstimated' IS NOT NULL
OR observation_attribute::JSONB->>'DescriptiveName' IS NOT NULL
OR observation_attribute::JSONB->>'LegalSize'IS NOT NULL
OR observation_attribute::JSONB->>'SpeciesSex' IS NOT NULL
OR observation_attribute::JSONB->>'Notes' IS NOT NULL
)
EXCEPT
SELECT sm.survey_id,
survey_date,
depth,
site_code,
site_name,
full_name,
block_num,
method_id,
obs.observable_item_id,
observable_item_name,
obs.measure_id,
measure_name,
obs.measure_value,
observation_attribute::JSONB->>'DescriptiveName' AS "DescriptiveName",
observation_attribute::JSONB->>'LegalSize' AS "LegalSize",
observation_attribute::JSONB->>'SpeciesSex' AS "SpeciesSex",
concat(observation_attribute::JSONB->>'SizeRaw', observation_attribute::JSONB->>'SizeClassRaw') AS "Size",
CASE
WHEN concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')='NoNo' THEN 'No'
WHEN concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')='YesYes' THEN 'Yes'
ELSE concat(observation_attribute::JSONB->>'SizeEstimated', observation_attribute::JSONB->>'SizeClassEstimated')
END AS "Estimated",
observation_attribute::JSONB->>'Notes' AS "Notes"
FROM observation obs
LEFT JOIN observable_item_ref oir ON oir.observable_item_id = obs.observable_item_id
LEFT JOIN measure_ref mr ON mr.measure_id = obs.measure_id
LEFT JOIN survey_method sm ON obs.survey_method_id = sm.survey_method_id
LEFT JOIN survey s ON sm.survey_id = s.survey_id
LEFT JOIN diver_ref dr ON obs.diver_id = dr.diver_id
LEFT JOIN site_ref sr ON s.site_id = sr.site_id
WHERE (phylum='Chordata' and (concat(observation_attribute::JSONB->>'SizeRaw', observation_attribute::JSONB->>'SizeClassRaw') is not null))
OR (CLASS='Cephalopoda'and (concat(observation_attribute::JSONB->>'SizeRaw', observation_attribute::JSONB->>'SizeClassRaw') is not null))
OR observation_attribute::JSONB->>'Notes' LIKE '%Added % from SpeciesID%'
OR observation_attribute::JSONB->>'Notes' LIKE '%SpeciesID Changed%'
OR observation_attribute::JSONB->>'Notes' LIKE '%Diver reference %'
OR observation_attribute::JSONB->>'Notes' LIKE '%Size data removed%'
Copy email from Lizzi 6 Apr 2023 for the record:
Thanks for sending this through. Initially I was a little confused at the 4 different columns of size outputs looking at the smallest “sizes” of “2” mapping to a measure name of “0.5cm” but now see that the size is in millimetres and measure name is the closest size bin in centimetres (and some of these smallest ones are clearly errors). I think it would be good to put that in the column headers and change the column names and order to be more user friendly – since we won’t be accessing them very often so there’s a high chance of forgetting how they are put together. Could the size columns be: size_raw (mm) and size_class (cm) (instead of measure name)? I also think for anyone using the file, then the measure_id and measure_value columns just add confusion. Unless you can see a reason to leave them in that I’m missing I think we should remove them. So, for the last columns in the file (columns 10 – 17), could they be: observable_item_name, description, size_class (cm), size_raw (mm), legal_size, size_estimated, species_sex, notes?
observable_item_name | measure_id | measure_name | measure_value | DescriptiveName | LegalSize | SpeciesSex | size | estimated |
---|---|---|---|---|---|---|---|---|
Petricia vernicina | 53 | 0.5cm | 1 | 2 | No | |||
Plagusia chabrus | 53 | 0.5cm | 1 | 2 | No | |||
Paguridae spp. | 53 | 0.5cm | 1 | 2 | No | |||
Carcinus maenas | 53 | 0.5cm | 2 | 2 | Yes | |||
Jasus edwardsii | 53 | 0.5cm | 2 | 2 | Yes | |||
Noumea closei | 53 | 0.5cm | 2 | 2 | No | |||
Noumea closei | 53 | 0.5cm | 2 | 2 | No | |||
Haliotis brazieri | 53 | 0.5cm | 95 | 2 | Yes | |||
Tosia australis | 53 | 0.5cm | 1 | 3 | No | |||
Dicathais orbita | 53 | 0.5cm | 1 | 3 | No | |||
Paguristes frontalis | 53 | 0.5cm | 1 | 3 | No | |||
Tosia australis | 53 | 0.5cm | 1 | 4 | No | |||
Phasianella australis | 53 | 0.5cm | 4 | 4 | No | |||
Aplysia gigantea | 53 | 0.5cm | 1 | 4 | No | |||
Haliotis roei | 53 | 0.5cm | 1 | 4 | No | |||
Pleuroploca australasia | 53 | 0.5cm | 4 | 4 | No | |||
Haliotis scalaris | 53 | 0.5cm | 1 | 5 | No | |||
Meridiastra calcar | 53 | 0.5cm | 1 | 5 | No | |||
Tosia australis | 53 | 0.5cm | 1 | 6 | No |
I also think it would be a good idea to have 2 files, one for the original size measurements as described above and then the rest of the “notes” and “descriptive names” (that do not have size info) in another file. Thanks again for all your work on this! Where will they be archived that we can access them?
For the record, this is the last feedback:
These files look great to me, thank you for getting that done 😊! My only suggestion would be to rename the ‘full_name’ column as ‘diver’ for clarity/consistency with our other endpoints.
The following files have been archived under archive/IMOS/NRMN:
Historical observation have sometimes attributes attached. These attributes are stored in the jsonb fields observation_attribute:
Similarly, some metadata are stored in the survey_method table:
There is a risk of metadata loss when historical data are corrected. How can we minimize the risk?