NEFSC / READ-SSB-CHAJI-Effort-Displacement---Scallop

Other
0 stars 0 forks source link

trip location formatting issue #55

Closed BryceMcManus-NOAA closed 2 years ago

BryceMcManus-NOAA commented 2 years ago

@mchaji It looks like the trip location columns ("DDLON" and "DDLAT") are either missing or a formatting issue occurred and pushed them into the "geometry" and "MN30SQID" columns as strings.

Library

library(tidyverse)

Read scallop data

final_product_lease <-
  read_csv(paste0("~/NE Scallops/data/updated/",
                  "NE NW Effort Displacement Collaboration Data/",
                  "final_product_lease.csv"))
final_product_lease %>% select(geometry, MN30SQID)
## # A tibble: 165,868 x 2
##    geometry            MN30SQID         
##    <chr>               <chr>            
##  1 c(-73.3833333333333 40.1)            
##  2 c(-73.5227777777778 40.0116666666667)
##  3 c(-72.6             39.7833333333333)
##  4 c(-74.1             38.5)            
##  5 c(-68.9666666666667 40.9166666666667)
##  6 c(-69               41.0166666666667)
##  7 c(-73.15            39.8666666666667)
##  8 c(-73.9388888888889 38.5630555555556)
##  9 c(-73.9891666666667 38.5202777777778)
## 10 c(-73.9522222222222 38.6769444444444)
## # ... with 165,858 more rows
names(final_product_lease)
##  [1] "...1"                      "TRIPID"                   
##  [3] "OPERATOR"                  "OPERNUM"                  
##  [5] "NSUBTRIP"                  "CREW"                     
##  [7] "VTR_PORTNUM"               "IMGID"                    
##  [9] "YEAR"                      "VTR_PORT"                 
## [11] "VTR_STATE"                 "TRIP_LENGTH"              
## [13] "PERMIT.y"                  "DEALNUM"                  
## [15] "DOLLAR"                    "POUNDS"                   
## [17] "LANDED"                    "GEARCODE"                 
## [19] "SECGEARFISH"               "SPPNAME"                  
## [21] "geoid"                     "namelsad"                 
## [23] "state_fips"                "port_lat"                 
## [25] "port_lon"                  "previous_geoid"           
## [27] "previous_namelsad"         "previous_state_fips"      
## [29] "previous_port_lat"         "previous_port_lon"        
## [31] "Date"                      "Time"                     
## [33] "TRIP_ID"                   "Plan Code"                
## [35] "Program Code"              "Area Identifier"          
## [37] "ftpt"                      "GC"                       
## [39] "LA"                        "hours"                    
## [41] "DB_LANDING_YEAR"           "TRIP_COST_2020_DOL"       
## [43] "TRIP_COST_WINSOR_2020_DOL" "OBSERVED_COST_DUMMY"      
## [45] "geometry"                  "MN30SQID"                 
## [47] "MN10SQID"                  "NAME"
mle2718 commented 2 years ago

DDLAT and DDLON

We have a spatial join in the processing code now, so that might be happening here:

point_geo <- st_as_sf(VTR_DMIS_AC, 
                      coords = c(x  = "DDLON", y = "DDLAT"), crs = crs )

or here:

point_geo_lease <- st_as_sf(final_product, 
                      coords = c(x  = "DDLON", y = "DDLAT"), crs = crs )

I can see two possible solutions:

  1. We can pass .Rds back and forth instead of csv.
  2. We can make sure the DDLAT and DDLON columns hang around (should be pretty easy).

MN30SQID

Should be pretty easy for us to figure out if this is a .csv problem with is.numeric(MN30SQID) or stick in a as.numeric() at the end if it's a deal with the st_as_sf()

I also just noticed we have a permit.y variable, which we should probably get rid of. My guess is that it's the permit number from the VTR_Query section.

BryceMcManus-NOAA commented 2 years ago

For the DDLON/DDLAT issue, I'd say solution 2. From a FishSET perspective, it would be easier if trip location exists as separate numeric columns rather than as a geometry list-column, although I can see the need for FishSET to handle cases like these.

This may be helpful for converting the sf_point geometry to separate lon and lat columns: https://github.com/r-spatial/sf/issues/231

sfc_as_cols <- function(x, names = c("x","y")) {
  stopifnot(inherits(x,"sf") && inherits(sf::st_geometry(x),"sfc_POINT"))
  ret <- sf::st_coordinates(x)
  ret <- tibble::as_tibble(ret)
  stopifnot(length(names) == ncol(ret))
  x <- x[ , !names(x) %in% names]
  ret <- setNames(ret,names)
  dplyr::bind_cols(x,ret)
}
mle2718 commented 2 years ago

Okay, I have a fix coded for DDLAT/DDLON and I'm just making sure it works.

mle2718 commented 2 years ago

R is telling me that MN30SQID is numeric, so I'm laying the blame on writing as a csv.

mle2718 commented 2 years ago

@mchaji : There's a new dataset in /home2/mlee/Effort-Displacement--Scallop/data/main can you forward the final_product_lease csv and .Rds to all via accellion? Thx.

mchaji commented 2 years ago

Done, should be able to find the most recent Rds & Csv here: https://sfc.doc.gov/w/f-a36c8ff1-e571-4530-a9c7-cb6448a208fe

mle2718 commented 2 years ago

Closing this -- code is written and data was sent over.... although @mchaji : can you send the link to data by email. Since we moved the repository into NEFSC, @BryceMcManus-NOAA can't see this.