Closed maurolepore closed 5 years ago
Thanks for spotting that @maurolepore, I uploaded cleaner .csv files.
Awesome! I can now access the files and the rownames are gone. Thanks! One last thing that might be improved is to fix the few ExactDate
values that cause these warnings:
library(purrr)
github_path <-
"SCBI-ForestGEO/SCBI-ForestGEO-Data/tree_main_census/data/census-csv-files"
download_urls <- ghr::ghr_ls_download_url(github_path, regexp = "stem|full")
download_urls %>%
purrr::map(readr::read_csv) %>%
purrr::set_names(fs::path_file(download_urls))
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> sp = col_character(),
#> quadrat = col_character(),
#> ExactDate = col_date(format = ""),
#> DFstatus = col_character(),
#> codes = col_character(),
#> status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 6 parsing failures.
#> row col expected actual file
#> 14729 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 14730 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 21048 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 21766 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> 27455 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full1.csv'
#> ..... ......... .......... .......... ...................................................................................................................................
#> See problems(...) for more details.
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> sp = col_character(),
#> quadrat = col_character(),
#> ExactDate = col_date(format = ""),
#> DFstatus = col_character(),
#> codes = col_character(),
#> status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 7940 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.full2.csv'
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> sp = col_character(),
#> quadrat = col_character(),
#> ExactDate = col_date(format = ""),
#> DFstatus = col_character(),
#> codes = col_character(),
#> status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 6 parsing failures.
#> row col expected actual file
#> 21974 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 21975 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 30367 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 31374 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> 38787 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem1.csv'
#> ..... ......... .......... .......... ...................................................................................................................................
#> See problems(...) for more details.
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> sp = col_character(),
#> quadrat = col_character(),
#> ExactDate = col_date(format = ""),
#> DFstatus = col_character(),
#> codes = col_character(),
#> status = col_character()
#> )
#> See spec(...) for full column specifications.
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 12930 ExactDate valid date 0000-00-00 'https://raw.githubusercontent.com/SCBI-ForestGEO/SCBI-ForestGEO-Data/master/tree_main_census/data/census-csv-files/scbi.stem2.csv'
#> $scbi.full1.csv
#> # A tibble: 40,283 x 20
#> treeID stemID tag StemTag sp quadrat gx gy DBHID CensusID
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10079 1 libe 0104 3.70 73 1 1
#> 2 2 2 10168 1 libe 0103 17.3 58.9 3 1
#> 3 3 3 10567 1 libe 0110 9 197. 5 1
#> 4 4 4 12165 1 nysy 0122 14.2 428. 7 1
#> 5 5 5 12190 1 havi 0122 9.40 436. 9 1
#> 6 6 6 12192 1 havi 0122 1.30 434 13 1
#> 7 7 7 12212 1 unk 0123 17.8 447. 15 1
#> 8 8 8 12261 1 libe 0125 18 484. 17 1
#> 9 9 9 12456 1 vipr 0130 18 598. 19 1
#> 10 10 10 12551 1 astr 0132 5.60 628. 22 1
#> # ... with 40,273 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> # hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> # nostems <dbl>, date <dbl>, status <chr>, agb <dbl>
#>
#> $scbi.full2.csv
#> # A tibble: 40,283 x 20
#> treeID stemID tag StemTag sp quadrat gx gy DBHID CensusID
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10079 1 libe 0104 3.70 73 41801 2
#> 2 2 2 10168 1 libe 0103 17.3 58.9 41723 2
#> 3 3 3 10567 1 libe 0110 9 197. 42501 2
#> 4 4 4 12165 1 nysy 0122 14.2 428. 42869 2
#> 5 5 5 12190 1 havi 0122 9.40 436. 42904 2
#> 6 6 6 12192 1 havi 0122 1.30 434 42909 2
#> 7 7 31200 12212 2 unk 0123 17.8 447. 42946 2
#> 8 8 31201 12261 2 libe 0125 18 484. 43023 2
#> 9 9 31202 12456 2 vipr 0130 18 598. 43249 2
#> 10 10 10 12551 1 astr 0132 5.60 628. 43346 2
#> # ... with 40,273 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> # hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> # nostems <dbl>, date <dbl>, status <chr>, agb <dbl>
#>
#> $scbi.stem1.csv
#> # A tibble: 55,295 x 20
#> treeID stemID tag StemTag sp quadrat gx gy DBHID CensusID
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10079 1 libe 0104 3.70 73 1 1
#> 2 1 31194 10079 2 libe 0104 3.70 73 2 1
#> 3 2 2 10168 1 libe 0103 17.3 58.9 3 1
#> 4 2 31195 10168 2 libe 0103 17.3 58.9 4 1
#> 5 3 3 10567 1 libe 0110 9 197. 5 1
#> 6 3 31196 10567 2 libe 0110 9 197. 6 1
#> 7 3 40394 10567 3 libe 0110 9 197. NA NA
#> 8 4 4 12165 1 nysy 0122 14.2 428. 7 1
#> 9 4 31197 12165 2 nysy 0122 14.2 428. 8 1
#> 10 5 5 12190 1 havi 0122 9.40 436. 9 1
#> # ... with 55,285 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> # hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> # countPOM <dbl>, date <dbl>, status <chr>, agb <dbl>
#>
#> $scbi.stem2.csv
#> # A tibble: 55,295 x 20
#> treeID stemID tag StemTag sp quadrat gx gy DBHID CensusID
#> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10079 1 libe 0104 3.70 73 41801 2
#> 2 1 31194 10079 2 libe 0104 3.70 73 41802 2
#> 3 2 2 10168 1 libe 0103 17.3 58.9 41723 2
#> 4 2 31195 10168 2 libe 0103 17.3 58.9 41724 2
#> 5 3 3 10567 1 libe 0110 9 197. 42501 2
#> 6 3 31196 10567 2 libe 0110 9 197. 42502 2
#> 7 3 40394 10567 3 libe 0110 9 197. 80573 2
#> 8 4 4 12165 1 nysy 0122 14.2 428. 42869 2
#> 9 4 31197 12165 2 nysy 0122 14.2 428. 42870 2
#> 10 5 5 12190 1 havi 0122 9.40 436. 42904 2
#> # ... with 55,285 more rows, and 10 more variables: dbh <dbl>, pom <dbl>,
#> # hom <dbl>, ExactDate <date>, DFstatus <chr>, codes <chr>,
#> # countPOM <dbl>, date <dbl>, status <chr>, agb <dbl>
Created on 2019-02-21 by the reprex package (v0.2.1)
The data is "raw", exactly how it was given by Suzanne. I believe we want to keep it this way and have people fix the problems in their own scripts. @gonzalezeb, maybe you can see with Suzanne if she can fix the dates while she is fixing the problems found with 3rd census?
Yeah, that makes sense. Thanks Valentine, with that fix, the data is much easier to access. I'm now closing this issue -- if needed @gonzalezeb may open a new one and point to this one.
BTW, talking about access to data, here is the beginning of a package to access remote data from SCBI. If you have anything to contribute let me know.
Maybe some day we could build a shiny app on top of these package to let users explore cool datasets from the web browser.
@ValentineHerr I will give the corrections to Suzanne, luckily there are not many records to correct. @maurolepore Thanks for the fgeo.scbi package (although I couldn't run it in my laptop, we can check it next week)
I'm trying to access the data from SCBI of census 1 and 2, and the species list. I noticed this issues:
Species lists is not available as a .csv file. I could only find the file scbi.spptable.rdata, which isn't useful for non-R users. And even in R I sometimes failed to load those files. Would you be okay to add a .csv file?
Census data is available as .csv but those files seem to have been written with
row.names = TRUE
(the default ofwrite.csv()
). That isn't very useful and makes analysis a little harder (notice thatreadr::read_csv()
defaults to now writing row names).