aodn / nrmn-application

A web application for collation, validation, and storage of all data obtained during surveys conducted by the NRMN
GNU General Public License v3.0
4 stars 3 forks source link

Biomass and rarity values may be affected by the rounding/double precision issue #1358

Open atcooper1 opened 5 months ago

atcooper1 commented 5 months ago

a and b values, plus trait values may be affected by the same rounding/double precision issue that is currently affecting lat/long. Eg. Zoramia leptacanthus Some a and b values also don't seem to be reflecting what is on Fishbase - might be worth checking the FB file that was ingested in 2022?

utas-raymondng commented 4 months ago

https://github.com/aodn/backlog/issues/5583

bpasquer commented 3 months ago
Biomass coefficients: The A's and b's value displayed in the database are the result of the handling of numbers as double in the script to generate to SQL code for the update. So again, because the values were handled as double in the script you see what you see is not exactly what you expected: script expected
a 0.00954992976039648 0.00955
b 3.049999952316284 3.05

Would you like to see the values of 'a' and 'b' being rounded? You might also recall that the values we ingested in 2022 were not the latest available on the Fishbase pages, as the most recent version was not publicly accessible on the website.

Rarity unfortunately, since rarity statistics are computed metrics, it is difficult to determine if they have been affected by the rounding issue, as we have no reference for comparison.

atcooper1 commented 3 months ago

I think rounding would be helpful, especially when copying a's and b's for superseded species.

bpasquer commented 3 months ago

A's rounded to 5 decimals? is it consistently the case though? B's rounded to 2 decimals?

atcooper1 commented 3 months ago

Yes, a's to 5 decimals, b's to 2. Thanks, Bene

bpasquer commented 3 months ago

From conversation 06/06/2024:

Bene : After examining the Rfishbase package more closely, it appears that an updated version of the database from May 2023 is available. If this update is indeed available( i need to look at the data), and considering that I've planned to re-ingest rounded biomass coefficients in the DB, I assume you would prefer the latest version to be ingested. Toni: Yes, that would be great if possible please?

Decision: update biomass coefficent to the latest Fishbase release and apply the rounding as agreed

bpasquer commented 1 month ago

SQL update was applied (ref https://github.com/aodn/nrmn-application/pull/1374) From testing , Toni identified discrepancies between updated values and Fishbase website. Values in the update were from the fb_parquet_2023-05 release in this repo https://github.com/cboettig/rfishbase_board/, the same repo as last update. This repos was thought to be the source of the Fishbase dataset. However, after a web research another data source for Fishbase was found with a more recent release(release24.07) and value in agreement with FB website here. And more specifically : https://huggingface.co/api/datasets/cboettig/fishbase/tree/main/data/fb/v24.07/parquet

The update script will be re-generate.