SCBI-ForestGEO / SCBI-ForestGEO-Data

Public data repository of the SCBI ForestGEO plot
https://scbi-forestgeo.github.io/SCBI-ForestGEO-Data/
Creative Commons Attribution 4.0 International
7 stars 7 forks source link

Provide tidy .csv files #7

Closed maurolepore closed 5 years ago

maurolepore commented 5 years ago

I'm trying to access the data from SCBI of census 1 and 2, and the species list. I noticed this issues:

ValentineHerr commented 5 years ago

Thanks for spotting that @maurolepore, I uploaded cleaner .csv files.

maurolepore commented 5 years ago

Awesome! I can now access the files and the rownames are gone. Thanks! One last thing that might be improved is to fix the few ExactDate values that cause these warnings:

library(purrr)

github_path <- 
  "SCBI-ForestGEO/SCBI-ForestGEO-Data/tree_main_census/data/census-csv-files"

download_urls <- ghr::ghr_ls_download_url(github_path, regexp = "stem|full")
download_urls %>% 
  purrr::map(readr::read_csv) %>% 
  purrr::set_names(fs::path_file(download_urls))
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   sp = col_character(),
#>   quadrat = col_character(),
#>   ExactDate = col_date(format = ""),
#>   DFstatus = col_character(),
#>   codes = col_character(),
#>   status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 6 parsing failures.
#>   row       col   expected     actual                                                                                                                                file
#> 14729 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 14730 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 21048 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 21766 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 27455 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> ..... ......... .......... .......... ...................................................................................................................................
#> See problems(...) for more details.
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   sp = col_character(),
#>   quadrat = col_character(),
#>   ExactDate = col_date(format = ""),
#>   DFstatus = col_character(),
#>   codes = col_character(),
#>   status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 1 parsing failure.
#>  row       col   expected     actual                                                                                                                                file
#> 7940 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full2.csv'
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   sp = col_character(),
#>   quadrat = col_character(),
#>   ExactDate = col_date(format = ""),
#>   DFstatus = col_character(),
#>   codes = col_character(),
#>   status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 6 parsing failures.
#>   row       col   expected     actual                                                                                                                                file
#> 21974 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 21975 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 30367 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 31374 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 38787 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> ..... ......... .......... .......... ...................................................................................................................................
#> See problems(...) for more details.
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   sp = col_character(),
#>   quadrat = col_character(),
#>   ExactDate = col_date(format = ""),
#>   DFstatus = col_character(),
#>   codes = col_character(),
#>   status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 1 parsing failure.
#>   row       col   expected     actual                                                                                                                                file
#> 12930 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem2.csv'
#> $scbi.full1.csv
#> # A tibble: 40,283 x 20
#>    treeID stemID   tag StemTag sp    quadrat    gx    gy DBHID CensusID
#>     <dbl>  <dbl> <dbl>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl>    <dbl>
#>  1      1      1 10079       1 libe  0104     3.70  73       1        1
#>  2      2      2 10168       1 libe  0103    17.3   58.9     3        1
#>  3      3      3 10567       1 libe  0110     9    197.      5        1
#>  4      4      4 12165       1 nysy  0122    14.2  428.      7        1
#>  5      5      5 12190       1 havi  0122     9.40 436.      9        1
#>  6      6      6 12192       1 havi  0122     1.30 434      13        1
#>  7      7      7 12212       1 unk   0123    17.8  447.     15        1
#>  8      8      8 12261       1 libe  0125    18    484.     17        1
#>  9      9      9 12456       1 vipr  0130    18    598.     19        1
#> 10     10     10 12551       1 astr  0132     5.60 628.     22        1
#> # ... with 40,273 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> #   hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> #   nostems <dbl>, date <dbl>, status <chr>, agb <dbl>
#> 
#> $scbi.full2.csv
#> # A tibble: 40,283 x 20
#>    treeID stemID   tag StemTag sp    quadrat    gx    gy DBHID CensusID
#>     <dbl>  <dbl> <dbl>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl>    <dbl>
#>  1      1      1 10079       1 libe  0104     3.70  73   41801        2
#>  2      2      2 10168       1 libe  0103    17.3   58.9 41723        2
#>  3      3      3 10567       1 libe  0110     9    197.  42501        2
#>  4      4      4 12165       1 nysy  0122    14.2  428.  42869        2
#>  5      5      5 12190       1 havi  0122     9.40 436.  42904        2
#>  6      6      6 12192       1 havi  0122     1.30 434   42909        2
#>  7      7  31200 12212       2 unk   0123    17.8  447.  42946        2
#>  8      8  31201 12261       2 libe  0125    18    484.  43023        2
#>  9      9  31202 12456       2 vipr  0130    18    598.  43249        2
#> 10     10     10 12551       1 astr  0132     5.60 628.  43346        2
#> # ... with 40,273 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> #   hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> #   nostems <dbl>, date <dbl>, status <chr>, agb <dbl>
#> 
#> $scbi.stem1.csv
#> # A tibble: 55,295 x 20
#>    treeID stemID   tag StemTag sp    quadrat    gx    gy DBHID CensusID
#>     <dbl>  <dbl> <dbl>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl>    <dbl>
#>  1      1      1 10079       1 libe  0104     3.70  73       1        1
#>  2      1  31194 10079       2 libe  0104     3.70  73       2        1
#>  3      2      2 10168       1 libe  0103    17.3   58.9     3        1
#>  4      2  31195 10168       2 libe  0103    17.3   58.9     4        1
#>  5      3      3 10567       1 libe  0110     9    197.      5        1
#>  6      3  31196 10567       2 libe  0110     9    197.      6        1
#>  7      3  40394 10567       3 libe  0110     9    197.     NA       NA
#>  8      4      4 12165       1 nysy  0122    14.2  428.      7        1
#>  9      4  31197 12165       2 nysy  0122    14.2  428.      8        1
#> 10      5      5 12190       1 havi  0122     9.40 436.      9        1
#> # ... with 55,285 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> #   hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> #   countPOM <dbl>, date <dbl>, status <chr>, agb <dbl>
#> 
#> $scbi.stem2.csv
#> # A tibble: 55,295 x 20
#>    treeID stemID   tag StemTag sp    quadrat    gx    gy DBHID CensusID
#>     <dbl>  <dbl> <dbl>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl>    <dbl>
#>  1      1      1 10079       1 libe  0104     3.70  73   41801        2
#>  2      1  31194 10079       2 libe  0104     3.70  73   41802        2
#>  3      2      2 10168       1 libe  0103    17.3   58.9 41723        2
#>  4      2  31195 10168       2 libe  0103    17.3   58.9 41724        2
#>  5      3      3 10567       1 libe  0110     9    197.  42501        2
#>  6      3  31196 10567       2 libe  0110     9    197.  42502        2
#>  7      3  40394 10567       3 libe  0110     9    197.  80573        2
#>  8      4      4 12165       1 nysy  0122    14.2  428.  42869        2
#>  9      4  31197 12165       2 nysy  0122    14.2  428.  42870        2
#> 10      5      5 12190       1 havi  0122     9.40 436.  42904        2
#> # ... with 55,285 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> #   hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> #   countPOM <dbl>, date <dbl>, status <chr>, agb <dbl>

Created on 2019-02-21 by the reprex package (v0.2.1)

ValentineHerr commented 5 years ago

The data is "raw", exactly how it was given by Suzanne. I believe we want to keep it this way and have people fix the problems in their own scripts. @gonzalezeb, maybe you can see with Suzanne if she can fix the dates while she is fixing the problems found with 3rd census?

maurolepore commented 5 years ago

Yeah, that makes sense. Thanks Valentine, with that fix, the data is much easier to access. I'm now closing this issue -- if needed @gonzalezeb may open a new one and point to this one.

BTW, talking about access to data, here is the beginning of a package to access remote data from SCBI. If you have anything to contribute let me know.

Maybe some day we could build a shiny app on top of these package to let users explore cool datasets from the web browser.

gonzalezeb commented 5 years ago

@ValentineHerr I will give the corrections to Suzanne, luckily there are not many records to correct. @maurolepore Thanks for the fgeo.scbi package (although I couldn't run it in my laptop, we can check it next week)