Lysbethk / nomilo-fishpond-analysis

https://lysbethk.github.io/nomilo-fishpond-analysis/
Creative Commons Zero v1.0 Universal
1 stars 0 forks source link

Merge data sets #3

Closed Lysbethk closed 4 months ago

Lysbethk commented 6 months ago
Lysbethk commented 6 months ago

Create lists of datasets we want to merge, and specify how to merge them and by what columns.

List of 'overlapping' variables:

Groupings:

date, weight growth_pct: ksf_clam_growth_data_tidied, ksf_oyster_cylinder_growth_data_tidied

date: ksf_compiled_data_tidied, tidal_data_tidied, weather_data_tidied

date, location, depth: water_samples_data_tidied, profiles_data_tidied

alemarieceria commented 6 months ago

We want to run a correlational analysis across time, so we'll do a conditional imputation where based on our available data, we'll use linear interpolation for continuous variables and mode imputation for categorical variables. In order to do this, we need to revert our initial replacement of NA values with zeros back because linear interpolation assumes the data points reflect actualy measurements and using the zeros could potentially skew the results.

Though the date columns from each dataset fall within a specific date range, the dates of the data collected for each dataset do not match with each other, and therefore, it is hard to merge.

> unique(ksf_clam_growth_data_tidied$date)
[1] "2023-10-17" "2023-12-06"
[3] "2023-12-12" "2024-01-02"
[5] "2024-01-10" "2024-01-24"
[7] "2024-01-31" "2024-02-08"
[9] "2024-02-13"

> unique(ksf_oyster_cylinder_growth_data_tidied$date)
 [1] "2023-11-20" "2023-11-27"
 [3] "2023-12-08" "2023-12-11"
 [5] "2023-12-18" "2023-12-28"
 [7] "2024-01-01" "2024-01-05"
 [9] "2024-01-08" "2024-01-12"
[11] "2024-01-17" "2024-01-23"
[13] "2024-02-14"

> unique(second_merge$date)
 [1] "2023-11-28" "2023-12-21"
 [3] "2024-01-09" "2024-01-30"
 [5] "2024-02-20" "2023-11-20"
 [7] "2023-11-21" "2023-11-22"
 [9] "2023-11-23" "2023-11-24"
[11] "2023-11-25" "2023-11-26"
[13] "2023-11-27" "2023-11-29"
[15] "2023-11-30" "2023-12-01"
[17] "2023-12-02" "2023-12-03"
[19] "2023-12-04" "2023-12-05"
[21] "2023-12-06" "2023-12-07"
[23] "2023-12-08" "2023-12-09"
[25] "2023-12-10" "2023-12-11"
[27] "2023-12-12" "2023-12-13"
[29] "2023-12-14" "2023-12-15"
[31] "2023-12-16" "2023-12-17"
[33] "2023-12-18" "2023-12-19"
[35] "2023-12-20" "2023-12-22"
[37] "2023-12-23" "2023-12-24"
[39] "2023-12-25" "2023-12-26"
[41] "2023-12-27" "2023-12-28"
[43] "2023-12-29" "2023-12-30"
[45] "2023-12-31" "2024-01-01"
[47] "2024-01-02" "2024-01-03"
[49] "2024-01-04" "2024-01-05"
[51] "2024-01-06" "2024-01-07"
[53] "2024-01-08" "2024-01-10"
[55] "2024-01-11" "2024-01-12"
[57] "2024-01-13" "2024-01-14"
[59] "2024-01-15" "2024-01-16"
[61] "2024-01-17" "2024-01-18"
[63] "2024-01-19" "2024-01-20"
[65] "2024-01-21" "2024-01-22"
[67] "2024-01-23" "2024-01-24"
[69] "2024-01-25" "2024-01-26"
[71] "2024-01-27" "2024-01-28"
[73] "2024-01-29" "2024-01-31"
[75] "2024-02-01" "2024-02-02"
[77] "2024-02-03" "2024-02-04"
[79] "2024-02-05" "2024-02-06"
[81] "2024-02-07" "2024-02-08"
[83] "2024-02-09" "2024-02-10"
[85] "2024-02-11" "2024-02-12"
[87] "2024-02-13" "2024-02-14"
[89] "2024-02-15" "2024-02-16"
[91] "2024-02-17" "2024-02-18"
[93] "2024-02-19"

> names(second_merge)
 [1] "date"                          
 [2] "round"                         
 [3] "location"                      
 [4] "depth"                         
 [5] "water_temperature"             
 [6] "dissolved_oxygen"              
 [7] "salinity"                      
 [8] "ksf_rdo_concentration"         
 [9] "ksf_rdo_saturation"            
[10] "ksf_oxygen_partial_pressure"   
[11] "ksf_actual_conductivity"       
[12] "ksf_specific_conductivity"     
[13] "ksf_salinity"                  
[14] "ksf_density"                   
[15] "ksf_total_dissolved_solids"    
[16] "ksf_chlorophyll_a_fluorescence"
[17] "ksf_ammonium"                  
[18] "ksf_ammonium_m_v"              
[19] "ksf_barometric_pressure"       
[20] "outdoor_temperature"           
[21] "wind_speed_mph"                
[22] "hourly_rain_inch_hr"           
[23] "wind_direction"                
[24] "time"                          
[25] "pred"                          
[26] "high_low" 

Removed tidal dara because it has little observations + doesn't have a close date to the profiles and water samples datasets