Closed PhDyellow closed 3 years ago
Hi Phil
You are correct – I don’t see that you should have the same species for the same Lat, Lon, Date, Time. Maybe something has been cutoff and not combined. For example, the same species will have males, females and juveniles, and unless you have that information for life stage (i.e. they haven’t been lumped to species) then you would have 3 separate abundances for the same species at the same time and place. Worth checking.
Cheers Ant
From: Phil Dyer @.> Reply to: MathMarEcol/pdyer_aus_bio @.> Date: Wednesday, 4 August 2021 at 3:28 pm To: MathMarEcol/pdyer_aus_bio @.> Cc: Subscribed @.> Subject: [MathMarEcol/pdyer_aus_bio] Apparent duplication of rows in zoo and phy data (#6)
Following on from #5https://github.com/MathMarEcol/pdyer_aus_bio/issues/5, I am now keeping track of samples from datasets, I am not aggregating by time and then space at a later point, but aggregating once into grid cells from all samples in the cell.
For the zooplankton and pytoplankton datasets, samples should be uniquely defined by lon, lat and time. I also expect a species to be counted once in a sample, so for a given lon, lat, time and species, there should be exactly one abundance recorded. However, I am finding rows where more than one abundance is recorded for a species at a given lon, lat and time.
I am currently discussing this with Claire Davies, this github issue is to document the problem and track the resolution.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/MathMarEcol/pdyer_aus_bio/issues/6, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLPRX5GLJSIRJZG2F3K2O3T3DFWZANCNFSM5BQI7R6A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
It is probably something like that, where male, female and juvenile are part of the data collection, and given separate rows, but the information hasn't reached me. I can't confirm if that is the exact cause, because sometimes I have 2 rows and sometimes 3 rows for a species.
I have given a sample of data that has this problem to Claire, so she can look it up easily.
That probably does make it more likely Phil that it is Male, Female, Juveniles, because for some copepod species we can only ID males and females, and for others males, females and juveniles. If you had 4, 5 or 6 it would be a different issue. At this stage, I would just sum them…
Cheers Ant
From: Phil Dyer @.> Reply to: MathMarEcol/pdyer_aus_bio @.> Date: Wednesday, 4 August 2021 at 3:58 pm To: MathMarEcol/pdyer_aus_bio @.> Cc: Anthony Richardson @.>, Comment @.***> Subject: Re: [MathMarEcol/pdyer_aus_bio] Apparent duplication of rows in zoo and phy data (#6)
It is probably something like that, where male, female and juvenile are part of the data collection, and given separate rows, but the information hasn't reached me. I can't confirm if that is the exact cause, because sometimes I have 2 rows and sometimes 3 rows for a species.
I have given a sample of data that has this problem to Claire, so she can look it up easily.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/MathMarEcol/pdyer_aus_bio/issues/6#issuecomment-892385882, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLPRX6YWO5A7AMNJ7PUYMLT3DJJVANCNFSM5BQI7R6A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
Claire has confirmed, it is Male and Female rows. I will sum them.
Following on from #5, I am now keeping track of samples from datasets, I am not aggregating by time and then space at a later point, but aggregating once into grid cells from all samples in the cell.
For the zooplankton and pytoplankton datasets, samples should be uniquely defined by lon, lat and time. I also expect a species to be counted once in a sample, so for a given lon, lat, time and species, there should be exactly one abundance recorded. However, I am finding rows where more than one abundance is recorded for a species at a given lon, lat and time.
I am currently discussing this with Claire Davies, this github issue is to document the problem and track the resolution.