Add page showing data by grid cell

Louis-Backstrom commented 6 years ago

I was thinking it might be a worthwhile thing to have a page that shows a map of all the grid squares (or more realistically just the land ones) shaded by how "complete" the data is for that square as of the latest update - a bit more detail than the binary yes/no shading that currently exists for the species maps and hopefully useful for showing people where and what data is still needed. Squares could be graded as "no data", "limited data" and "sufficient data" or similar.

I'm not sure what data we necessarily want, but I came up with a sort of weighted criteria set that we could apply to each cell to determine whether the data is "acceptable" or not - this is sort of what I was thinking:

Criteria: Complete Records - 25% Weighting

Square must have >20 complete records total

Criteria: Seasonal Records - 10% Weighting

Square must have >5 complete records for each season - 3.75% for each complete season

Criteria: Sampling Events - 15% Weighting

Square must have >25 sampling events (ie checklists of any kind) total

Criteria: Total Time - 15% Weighting

Square must have >5 hours total effort across sampling events

Criteria: Seasonal Time - 5% Weighting

Square must have >1 hour total effort per season across sampling events - 1.25% for each complete season

Criteria: Time of Day - 20% Weighting

Square must have >10 sampling events for each time of day - nocturnal and diurnal - 10% for each complete time

Criteria: Number of Observers - 10% Weighting

Square must have had >5 observers across all sampling events

Obviously the weighting, required values and criteria themselves are all subject to change to come up with a complete set of criteria that is fair and reflects the kind of data we want in the atlas. Theoretically, all these values should be relatively straightforward to automatically calculate as all the requisite data should be in the eBird database. The only other criteria ideas I had were number of sites (say >5, so there's not just one big hotspot that gobbles everyone up) and number of species recorded (but I think this implies that there's a set number of species that must be recorded, which isn't necessarily valid).

Hope this makes sense - doesn't necessarily need to be implemented but I thought something like this would be useful for measuring the "health" of the dataset.

jeffreyhanson commented 6 years ago

Yeah, I really like this idea of highlighting poorly sampled areas in Brisbane. What do you think about if we just applied the same rules for identifying "grids cells that are poorly sampled by checklists" (i.e. grey cells in "All year" map) and "grid cells that are poorly sampled by records" (i.e. grey cells in "Detections" map) instead of showing these new metrics? I worry that having multiple schemes/metrics for saying which areas are poorly sampled might confuse some people. In terms of fitting this into the atlas, what if we created a new chapter called "Brisbane city", then put the "Brisbane's environment" chapter as a section into the "Brisbane city" chapter, and also put this new "Brisbane's checklists" (or better name?) section under the "Brisbane city" chapter too.

Also, I like this idea of monitoring the "health"/"quality" of the data. Maybe we could add some time-series graphs to display changes in grid cell sampling coverage over time?

Louis-Backstrom commented 6 years ago

I think we should be wary of overcomplicating things for sure, but I reckon having a bit more detail than just checklists/records could be helpful too. Perhaps we could have that on the "front end" and then something more in-depth (like my set of criteria or similar) on the back-end, which then gets displayed as the health / time graph you suggested?

dbl3raf commented 6 years ago

This is a brilliant idea, and actually complements the survey sheet functionality, e.g. the survey sheets could be downloable as a pop-up link when clicking on each grid square on these maps. I agree the metrics needs to be simple, and beyond that they should closely reflect what we are using on the main species accounts, as there needs to be a strong logical connection between how we actually calculate and display data in the species accounts, and how we measure the health of sampling in each grid squares. I'll have a think about these.

dbl3raf commented 5 years ago

With everything else that needs to be done prior to launch (writing descriptions for surveyor sheets, drafting accounts, adding photos etc) I think we should hibernate this for a bit

jeffreyhanson commented 5 years ago

Sounds good. Also - I really like the "hibernated" tag - it conveys that idea of "we can work on this later", and it has biological connotations too!

bird-team / brisbane-bird-atlas

Add page showing data by grid cell #92