CalCOFI / workflows

helper scripts in R for common workflows
https://calcofi.io/workflows
MIT License
0 stars 0 forks source link

merge CalCOFI fish larvae and bottle data #21

Open bbest opened 1 year ago

bbest commented 1 year ago

From Erica Mason:

Hello Ben,

It was great meeting you at the CalCOFI conference this week. Thank you for offering to help with finding a solution to my CalCOFI data merging issues. I really appreciate it. I was curious to hear your thoughts on how best to move forward... virtually meet to discuss specific needs and next steps? Send you an email with what I'm trying to accomplish with example data frames? Virtually meet to go straight into troubleshooting? I've cc'd Erin here too in case she had something else in mind (bringing other folks in? not sure).

Please let me know what works best. This may be a long shot for you....but I'll be around next week through Thursday (12/15) and the following week, Mon-Wed (12/19 - 12/21).

Thanks again, Erica

bbest commented 1 year ago

See new function get_nearest_bottle() here:

ISSUE: How to properly matching larval data to a location and time since 1999? \ Here are some of the files larvae_count data without expected matches:

edweber commented 1 year ago

Problem was at least one bad merge (I didn't check further after that). I have not used rstudio online before but I stuck a chunk in to demonstrate beginning at line 176. I gave Marina some restructured data just before CalCOFI which she hasn't had a chance to convert to yet. With those changes, the sql will become much less painful and more intuitive (surrogate keys instead of composite primaries with, e.g., six fields for a net).

I really think we should also take a look at the Scripps CalCOFI db structure before we spend a lot of time on these queries. For example, having date and time as separate fields is really error-prone and also painful to program with here. I know Access doesn't have a datetime offset data type but Access shouldn't be the repository of these data anyway.

Should do a quick final check to make sure we're happy with the underlying data structure before we try to build too much more on top of it?

bbest commented 8 months ago

Space for Time Substitutions

Look into this paper:

image

image