AquaAuma / FishGlob_data

Database and methods related to the manuscript "An integrated database of fish biodiversity sampled with scientific bottom trawl surveys"
Creative Commons Attribution 4.0 International
21 stars 7 forks source link

Possible issues with `timestamp` column #18

Closed edwardlavender closed 1 year ago

edwardlavender commented 1 year ago

This repo is a fantastic resource!

I noticed an issue with the match between time stamps and recorded years/months/days:

# Load data 
con <- "https://github.com/AquaAuma/FishGlob_data/blob/main/outputs/Compiled_data/FishGlob_public_clean.RData?raw=true"
load(url(con))
# Compare time stamps and years for the first bunch of rows 
data[1:10, c("timestamp", "year")]

> # A tibble: 10 × 4
   timestamp   year month   day
   <chr>      <int> <int> <int>
 1 2021-03-01  1983     8    18
 2 2021-03-01  1983     8    18
 3 2021-03-01  1983     8    18
 4 2021-03-01  1983     8    18
 5 2021-03-01  1983     8    18
 6 2021-03-01  1983     8    18
 7 2021-03-01  1983     8    18
 8 2021-03-01  1983     8    19
 9 2021-03-01  1983     8    19
10 2021-03-01  1983     8    19

I think the years/months/day columns are probably correct and the timestamp column has been incorrectly defined? The entities in that column vary in structure (e.g. yyyy-mm-dd, yyyy-mm and mm/yyyy).

I think at least some surveys record the time of day of each trawl deployment and/or retrieval as well. It might be useful to retain that information.

AquaAuma commented 1 year ago

the timestamp column relates to the when the data were retrieved for creating the dataset, and it's not related to the sampling time. We'll try to reformat if enough time, but it's really just to indicate when the integration of datasets was done. This is implications for when we matched the surveys with the taxonomy for instance

edwardlavender commented 1 year ago

Thanks for this clarification! That makes sense. If possible, you might want to consider renaming the timestamp column to make its meaning clearer & prevent accidental misuse (e.g., timestamp_data_retrieval). A separate column with the time stamps of surveys (year/month/day + time where available) could be useful.

AquaAuma commented 1 year ago

since it's detailed in the manuscript in Table 2, I won't change the name