MathMarEcol / pdyer_aus_bio

GNU General Public License v3.0
0 stars 0 forks source link

Apparent duplication of rows in zoo and phy data #6

Closed PhDyellow closed 3 years ago

PhDyellow commented 3 years ago

Following on from #5, I am now keeping track of samples from datasets, I am not aggregating by time and then space at a later point, but aggregating once into grid cells from all samples in the cell.

For the zooplankton and pytoplankton datasets, samples should be uniquely defined by lon, lat and time. I also expect a species to be counted once in a sample, so for a given lon, lat, time and species, there should be exactly one abundance recorded. However, I am finding rows where more than one abundance is recorded for a species at a given lon, lat and time.

I am currently discussing this with Claire Davies, this github issue is to document the problem and track the resolution.

ric325 commented 3 years ago

Hi Phil

You are correct – I don’t see that you should have the same species for the same Lat, Lon, Date, Time. Maybe something has been cutoff and not combined. For example, the same species will have males, females and juveniles, and unless you have that information for life stage (i.e. they haven’t been lumped to species) then you would have 3 separate abundances for the same species at the same time and place. Worth checking.

Cheers Ant

From: Phil Dyer @.> Reply to: MathMarEcol/pdyer_aus_bio @.> Date: Wednesday, 4 August 2021 at 3:28 pm To: MathMarEcol/pdyer_aus_bio @.> Cc: Subscribed @.> Subject: [MathMarEcol/pdyer_aus_bio] Apparent duplication of rows in zoo and phy data (#6)

Following on from #5https://github.com/MathMarEcol/pdyer_aus_bio/issues/5, I am now keeping track of samples from datasets, I am not aggregating by time and then space at a later point, but aggregating once into grid cells from all samples in the cell.

For the zooplankton and pytoplankton datasets, samples should be uniquely defined by lon, lat and time. I also expect a species to be counted once in a sample, so for a given lon, lat, time and species, there should be exactly one abundance recorded. However, I am finding rows where more than one abundance is recorded for a species at a given lon, lat and time.

I am currently discussing this with Claire Davies, this github issue is to document the problem and track the resolution.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/MathMarEcol/pdyer_aus_bio/issues/6, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLPRX5GLJSIRJZG2F3K2O3T3DFWZANCNFSM5BQI7R6A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

PhDyellow commented 3 years ago

It is probably something like that, where male, female and juvenile are part of the data collection, and given separate rows, but the information hasn't reached me. I can't confirm if that is the exact cause, because sometimes I have 2 rows and sometimes 3 rows for a species.

I have given a sample of data that has this problem to Claire, so she can look it up easily.

ric325 commented 3 years ago

That probably does make it more likely Phil that it is Male, Female, Juveniles, because for some copepod species we can only ID males and females, and for others males, females and juveniles. If you had 4, 5 or 6 it would be a different issue. At this stage, I would just sum them…

Cheers Ant

From: Phil Dyer @.> Reply to: MathMarEcol/pdyer_aus_bio @.> Date: Wednesday, 4 August 2021 at 3:58 pm To: MathMarEcol/pdyer_aus_bio @.> Cc: Anthony Richardson @.>, Comment @.***> Subject: Re: [MathMarEcol/pdyer_aus_bio] Apparent duplication of rows in zoo and phy data (#6)

It is probably something like that, where male, female and juvenile are part of the data collection, and given separate rows, but the information hasn't reached me. I can't confirm if that is the exact cause, because sometimes I have 2 rows and sometimes 3 rows for a species.

I have given a sample of data that has this problem to Claire, so she can look it up easily.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MathMarEcol/pdyer_aus_bio/issues/6#issuecomment-892385882, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLPRX6YWO5A7AMNJ7PUYMLT3DJJVANCNFSM5BQI7R6A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

PhDyellow commented 3 years ago

Claire has confirmed, it is Male and Female rows. I will sum them.