afsc-gap-products / gapindex

Calculation of Design-Based Indices of Abundance and Composition for AFSC GAP Bottom Trawl Surveys
https://afsc-gap-products.github.io/gapindex/
Other
4 stars 3 forks source link

gapindex::get_data error in function : Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns #36

Closed kholsman closed 7 months ago

kholsman commented 11 months ago

gapindex_data <- gapindex::get_data(

MargaretSiple-NOAA commented 11 months ago

Hi Kirstin-- I think this is a bug in an earlier version of gapindex -- your code as written works for me. If reinstalling the most recent main branch version via github doesn't work, you can try installing from the dev branch:

devtools::install_github("afsc-gap-products/gapindex")
# and if you're still getting the same error,
devtools::install_github("afsc-gap-products/gapindex@development")
zoyafuso-NOAA commented 11 months ago

Hi @kholsman ,

Thanks for checking out the package. I agree with @MargaretSiple-NOAA that this is probably a bug from an earlier version of the package (still figuring out the whole compatibility thing with packages). Let us know if reinstalling works for you.

zoyafuso-NOAA commented 11 months ago

Hi @kholsman ,

Just checking in, did you get a chance to reinstall and check whether you still get this issue?

mkapur-noaa commented 8 months ago

I just did a fresh install of the package today (main branch) and encountered this same issue using the code in the vignette (pasted below). Can you confirm how to resolve?

production_data <- gapindex::get_data(
  year_set = 1982:2023,
  survey_set = "EBS",
  spp_codes = NULL,
  pull_lengths = TRUE, 
  haul_type = 3, 
  abundance_haul = "Y",
  sql_channel = sql_channel)

Pulling survey design table...
Pulling cruise table...
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
zoyafuso-NOAA commented 8 months ago

Hi @mkapur-noaa

Thanks for also highlighting this error. I also just did a fresh install and was able to run this piece of code. What version of R and gapindex are you using, i.e., what is the output you get when you do:

packageVersion("gapindex")
R.Version()

I'm running on gapindex version 2.1.3 and R version 4.0.2 (which is ancient...). So it could be an R version compatibility that I haven't resolved yet.

mkapur-noaa commented 8 months ago

Thanks for the quick reply - I am also on gapindex v 2.1.3 but R 4.3.2, so that'd be my suspect.

On Wed, Jan 17, 2024 at 4:52 PM zoyafuso-NOAA @.***> wrote:

Hi @mkapur-noaa https://github.com/mkapur-noaa

Thanks for also highlighting this error. I also just did a fresh install and was able to run this piece of code. What version of R and gapindex are you using, i.e., what is the output you get when you do:

packageVersion("gapindex") R.Version()

I'm running on gapindex version 2.1.3 and R version 4.0.2 (which is ancient...). So it could be an R version compatibility that I haven't resolved yet.

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/gapindex/issues/36#issuecomment-1897571948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMO6DTPZFUOOP76XMTKGNLYPBW4LAVCNFSM6AAAAAA5ZNQDY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJXGU3TCOJUHA . You are receiving this because you were mentioned.Message ID: @.***>

zoyafuso-NOAA commented 8 months ago

Hmmm, yeah okay, I will look into that. Make sense.

On Wed, Jan 17, 2024 at 4:54 PM Maia Kapur @.***> wrote:

Thanks for the quick reply - I am also on gapindex v 2.1.3 but R 4.3.2, so that'd be my suspect.

On Wed, Jan 17, 2024 at 4:52 PM zoyafuso-NOAA @.***> wrote:

Hi @mkapur-noaa https://github.com/mkapur-noaa

Thanks for also highlighting this error. I also just did a fresh install and was able to run this piece of code. What version of R and gapindex are you using, i.e., what is the output you get when you do:

packageVersion("gapindex") R.Version()

I'm running on gapindex version 2.1.3 and R version 4.0.2 (which is ancient...). So it could be an R version compatibility that I haven't resolved yet.

— Reply to this email directly, view it on GitHub < https://github.com/afsc-gap-products/gapindex/issues/36#issuecomment-1897571948>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJMO6DTPZFUOOP76XMTKGNLYPBW4LAVCNFSM6AAAAAA5ZNQDY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJXGU3TCOJUHA>

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/gapindex/issues/36#issuecomment-1897574083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOMCLOOG2WAJT7ZBWBRYEXTYPBXCLAVCNFSM6AAAAAA5ZNQDY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJXGU3TIMBYGM . You are receiving this because you commented.Message ID: @.***>

-- Zack Suriya Oyafuso, Ph.D. (he/him/his) Research Fish Biologist GitHub: github.com/zoyafuso-NOAA

NOAA Fisheries, Alaska Fisheries Science Center Resource Assessment and Conservation Engineering Division Groundfish Assessment Program, AI-GOA

7600 Sand Point Way NE, Bldg 4 Seattle, Washington 98115 Google Voice: (206) 526-4026

MargaretSiple-NOAA commented 8 months ago

The plot thickens! I was able to run the vignette code myself when using R 4.3.2 and gapindex v2.1.3. Maia can you maybe post your sessionInfo()? I suspect this is an issue with merge() but I am mystified how we could have the same version of gapindex and not get the same error.

mkapur-noaa commented 8 months ago

Here is my sessionInfo(). Related to merge I'm on data.table v1.14.8. Does the package intend to use base merge()? If so, consider appending base:: before the relevant calls so it doesn't look to the wrong package?

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] remotes_2.4.2.1     rlang_1.1.1         magrittr_2.0.3      R2admb_0.7.16.3     furrr_0.3.1         tidytable_0.10.2   
  [7] e1071_1.7-13        compiler_4.3.2      systemfonts_1.0.5   callr_3.7.3         vctrs_0.6.3         reshape2_1.4.4     
 [13] RODBC_1.3-23        rvest_1.0.3         stringr_1.5.1       profvis_0.3.8       pkgconfig_2.0.3     crayon_1.5.2       
 [19] fastmap_1.1.1       ellipsis_0.3.2      utf8_1.2.3          promises_1.2.0.1    rmarkdown_2.25      sessioninfo_1.2.2  
 [25] tzdb_0.4.0          ps_1.7.5            purrr_1.0.2         bit_4.0.5           xfun_0.41           afscassess_0.1.0   
 [31] cachem_1.0.8        jsonlite_1.8.8      r4ss_1.49.1         later_1.3.1         scico_1.4.0.9000    parallel_4.3.2     
 [37] prettyunits_1.2.0   R6_2.5.1            stringi_1.7.12      parallelly_1.36.0   pkgload_1.3.3       Rcpp_1.0.10        
 [43] knitr_1.45          usethis_2.1.6       httpuv_1.6.11       tidyselect_1.2.0    rstudioapi_0.15.0   codetools_0.2-19   
 [49] miniUI_0.1.1.1      processx_3.8.2      listenv_0.9.0       pkgbuild_1.4.2      lattice_0.21-9      tibble_3.2.1       
 [55] plyr_1.8.8          shiny_1.8.0         coda_0.19-4         evaluate_0.23       future_1.33.0       proxy_0.4-27       
 [61] urlchecker_1.0.1    xml2_1.3.6          pillar_1.9.0        KernSmooth_2.23-22  generics_0.1.3      vroom_1.6.3        
 [67] rprojroot_2.0.4     ggplot2_3.4.4       munsell_0.5.0       scales_1.3.0        globals_0.16.2      xtable_1.8-4       
 [73] gapindex_2.1.3      class_7.3-22        glue_1.6.2          funcr_0.1.0         tools_4.3.2         data.table_1.14.8  
 [79] webshot_0.5.5       forcats_1.0.0       fs_1.6.2            grid_4.3.2          afscdata_0.0.0.9000 tidyr_1.3.0        
 [85] gh_1.4.0            devtools_2.4.5      colorspace_2.1-0    Rmisc_1.5.1         cli_3.6.1           kableExtra_1.3.4   
 [91] akgfmaps_3.0.0      fansi_1.0.4         viridisLite_0.4.2   svglite_2.1.3       dplyr_1.1.3         corpcor_1.6.10     
 [97] gtable_0.3.4        digest_0.6.31       classInt_0.4-10     htmlwidgets_1.6.2   memoise_2.0.1       htmltools_0.5.5    
[103] lifecycle_1.0.4     httr_1.4.7          here_1.0.1          mime_0.12           bit64_4.0.5     
mkapur-noaa commented 8 months ago

OK, I got into the guts of gapindex::get_data and found that the merge seems to fail because nothing was actually pulled during the cruise_data call; when I do head(cruise_data) I get an error 42S02 942 [Oracle][ODBC][Ora]ORA-00942: table or view does not exist\n.

The survey_design object is populated, so I know it's not an Oracle connection issue.

> head(survey_design)
  SURVEY_DEFINITION_ID SURVEY YEAR DESIGN_YEAR
1                   98    EBS 1982        2022
2                   98    EBS 1983        2022
3                   98    EBS 1984        2022
4                   98    EBS 1985        2022
5                   98    EBS 1986        2022

Here are the other relevant variables:

> survey_def_ids_vec
[1] "(98)"

> gapindex::stitch_entries(abundance_haul)
[1] "('Y')"

> year_vec
[1] "(1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023)"

Looking at the query, I'm not at all familiar with the RACE_DATA schema and am wondering if I simply don't have permissions for it. When I tried to ping that table alone I get nothing -- is it a special one?

> head(RODBC::sqlQuery(
+     channel = sql_channel, 
+     query ="SELECT * FROM RACE_DATA.CRUISES"))
[1] "42S02 942 [Oracle][ODBC][Ora]ORA-00942: table or view does not exist\n"  
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT * FROM RACE_DATA.CRUISES'"
MargaretSiple-NOAA commented 8 months ago

Oooh... Zack and I just chatted about this so he has probably already told you but RACE_DATA, apparently, is special and not everyone has access! We thought everyone did when developing those scripts. Sadly it is currently the only schema where that information is located, so for now I think Zack is working on just getting you access. But I think we'll work on a solution that either pulls the cruise data from somewhere more accessible (like GAP_PRODUCTS) or embeds that info in the package somewhere. Thank you for your patience!

mkapur-noaa commented 8 months ago

Thanks, glad we caught this. I'm going to send a blanket response to Matt C.'s recent email to flag that this is something many of us might encounter as we start to investigate these products. I know it seems early in the game but I'm needing to front-load most of my data bridging efforts this winter. Thanks so much for your help! I will do a final run once I confirm I have access to that schema and report back before we close this issue.

zoyafuso-NOAA commented 8 months ago

The long-term solution could also be to add a bit of code to the gapindex::get_connected() function to check that the user has access to all of the tables that are in queried in the gapindex::get_data() function and then return an warning/error if there are some access issues and to contact GAP for access. But yeah, we'll make sure you and Kirstin have access to that schema.

mkapur-noaa commented 8 months ago

Confirming that after obtaining SELECT access to RACE_DATA I was able to execute the function. Feel free to close.

MargaretSiple-NOAA commented 8 months ago

Awesome! Thanks Maia and Kirstin for finding this. I'm going to leave it open until we have a fix we're satisfied with (whether in get_connected() or by querying other tables for the cruise lookup).

zoyafuso-NOAA commented 7 months ago

This now merged pull request (#48) hopefully prevents this error from originally occurring.