OpenHumans / ohjh-example-notebooks

Explore your personal data with https://notebooks.openhumans.org
http://openhumansfoundation.org/ohjh-example-notebooks/
MIT License
22 stars 10 forks source link

Ubiome R notebook #10

Open HadrienG opened 6 years ago

HadrienG commented 6 years ago

Ready for testing

Additional remarks

Some of the ubiome json files are not valid json, or are in other format (csv and tab delimited). Ideally there should be a data validation when accepting the upload files (I can file an issue in the repo if you want). Anyway the r code in the notebook should handle the different formats 🎉

madprime commented 6 years ago

On this repo, I think! https://github.com/OpenHumans/oh-ubiome-source

gedankenstuecke commented 6 years ago

I had some trouble getting it to run on my end:

  1. the data/ folder needs to exist already, otherwise it'll crash. But that's easy enough to fix.

Then I had a problem with getting the reference data loaded in.

There's a lot of

Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'

And in the end the data_container isn't created. See the end of the error message here:

screen shot 2018-05-11 at 09 27 19

Just for fun I also tried loading my own data in the next steps but that also yielded an error:

screen shot 2018-05-11 at 09 25 15
HadrienG commented 6 years ago

Could you give me the whole traceback for the object taxon not found? Not sure if it happens during the json or the csv parsing.

gedankenstuecke commented 6 years ago

Sure, running

invisible(map2(jsons$download_url, paste0("data/", jsons$id, ".json"), download.file))
file.remove("data/37844.json")  # invalid json
data_path <- dir("data", pattern = '.json', full.names = TRUE)
data_container <- map(data_path, read_json_or_tbl_or_csv)

I get the following output and data_container isn't actually created. I was wondering: did you run your code directly on notebooks.openhumans.org?

Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37852.json' file 2    11 <NA>  6 columns 7 columns 'data/37852.json' row 3    14 <NA>  6 columns 7 columns 'data/37852.json' col 4    15 <NA>  6 columns 7 columns 'data/37852.json' expected 5    16 <NA>  6 columns 7 columns 'data/37852.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“7 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37870.json' file 2     9 <NA>  6 columns 7 columns 'data/37870.json' row 3    12 <NA>  6 columns 7 columns 'data/37870.json' col 4    13 <NA>  6 columns 7 columns 'data/37870.json' expected 5    15 <NA>  6 columns 8 columns 'data/37870.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_integer(),
  parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“9 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 7 columns 'data/37872.json' file 2     6 <NA>  6 columns 7 columns 'data/37872.json' row 3     9 <NA>  6 columns 7 columns 'data/37872.json' col 4    10 <NA>  6 columns 8 columns 'data/37872.json' expected 5    13 <NA>  6 columns 7 columns 'data/37872.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“16 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     3 <NA>  6 columns 8 columns 'data/37874.json' file 2     4 <NA>  6 columns 7 columns 'data/37874.json' row 3     7 <NA>  6 columns 8 columns 'data/37874.json' col 4    19 <NA>  6 columns 8 columns 'data/37874.json' expected 5    20 <NA>  6 columns 8 columns 'data/37874.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“103 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     8 <NA>  6 columns 7 columns 'data/37876.json' file 2     9 <NA>  6 columns 7 columns 'data/37876.json' row 3    10 <NA>  6 columns 7 columns 'data/37876.json' col 4    12 <NA>  6 columns 7 columns 'data/37876.json' expected 5    16 <NA>  6 columns 7 columns 'data/37876.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“78 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37878.json' file 2     6 <NA>  6 columns 7 columns 'data/37878.json' row 3    18 <NA>  6 columns 7 columns 'data/37878.json' col 4    20 <NA>  6 columns 7 columns 'data/37878.json' expected 5    23 <NA>  6 columns 7 columns 'data/37878.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“73 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37880.json' file 2     6 <NA>  6 columns 7 columns 'data/37880.json' row 3    19 <NA>  6 columns 7 columns 'data/37880.json' col 4    22 <NA>  6 columns 7 columns 'data/37880.json' expected 5    24 <NA>  6 columns 7 columns 'data/37880.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“87 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     7 <NA>  6 columns 7 columns 'data/37882.json' file 2     8 <NA>  6 columns 7 columns 'data/37882.json' row 3     9 <NA>  6 columns 7 columns 'data/37882.json' col 4    11 <NA>  6 columns 7 columns 'data/37882.json' expected 5    20 <NA>  6 columns 7 columns 'data/37882.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“24 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     5 <NA>  6 columns 7 columns 'data/37884.json' file 2     6 <NA>  6 columns 7 columns 'data/37884.json' row 3     9 <NA>  6 columns 7 columns 'data/37884.json' col 4    19 <NA>  6 columns 7 columns 'data/37884.json' expected 5    20 <NA>  6 columns 7 columns 'data/37884.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“70 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  1 columns 2 columns 'data/37902.json' file 2    14 <NA>  1 columns 2 columns 'data/37902.json' row 3    15 <NA>  1 columns 2 columns 'data/37902.json' col 4    16 <NA>  1 columns 2 columns 'data/37902.json' expected 5    17 <NA>  1 columns 2 columns 'data/37902.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“124 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    13 <NA>  1 columns 2 columns 'data/37904.json' file 2    16 <NA>  1 columns 2 columns 'data/37904.json' row 3    18 <NA>  1 columns 2 columns 'data/37904.json' col 4    21 <NA>  1 columns 2 columns 'data/37904.json' expected 5    22 <NA>  1 columns 2 columns 'data/37904.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“79 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 3 columns 'data/37906.json' file 2    11 <NA>  1 columns 2 columns 'data/37906.json' row 3    16 <NA>  1 columns 2 columns 'data/37906.json' col 4    27 <NA>  1 columns 2 columns 'data/37906.json' expected 5    30 <NA>  1 columns 5 columns 'data/37906.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“117 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     6 <NA>  1 columns 2 columns 'data/37908.json' file 2    10 <NA>  1 columns 2 columns 'data/37908.json' row 3    13 <NA>  1 columns 2 columns 'data/37908.json' col 4    14 <NA>  1 columns 2 columns 'data/37908.json' expected 5    15 <NA>  1 columns 2 columns 'data/37908.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“92 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    11 <NA>  6 columns 7 columns 'data/37914.json' file 2    12 <NA>  6 columns 7 columns 'data/37914.json' row 3    13 <NA>  6 columns 7 columns 'data/37914.json' col 4    17 <NA>  6 columns 7 columns 'data/37914.json' expected 5    24 <NA>  6 columns 7 columns 'data/37914.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  `tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“96 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1    10 <NA>  1 columns 2 columns 'data/37922.json' file 2    11 <NA>  1 columns 2 columns 'data/37922.json' row 3    12 <NA>  1 columns 2 columns 'data/37922.json' col 4    13 <NA>  1 columns 2 columns 'data/37922.json' expected 5    17 <NA>  1 columns 2 columns 'data/37922.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_integer(),
  count_norm = col_integer(),
  taxon = col_integer(),
  parent = col_integer()
)
Parsed with column specification:
cols(
  tax_name = col_character(),
  tax_rank = col_character(),
  count = col_character(),
  count_norm = col_character(),
  taxon = col_character(),
  parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“76 parsing failures.
row # A tibble: 5 x 5 col     row col   expected  actual    file              expected   <int> <chr> <chr>     <chr>     <chr>             actual 1     9 <NA>  6 columns 7 columns 'data/37944.json' file 2    12 <NA>  6 columns 7 columns 'data/37944.json' row 3    13 <NA>  6 columns 7 columns 'data/37944.json' col 4    14 <NA>  6 columns 7 columns 'data/37944.json' expected 5    19 <NA>  6 columns 7 columns 'data/37944.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Warning message:
“Duplicated column names deduplicated: '{\\n' => '{\\n_1' [28], '\\"taxon\\":' => '\\"taxon\\":_1' [29], '\\"parent\\":' => '\\"parent\\":_1' [31], '\\"count\\":' => '\\"count\\":_1' [33], '4706,\\n' => '4706,\\n_1' [34], '\\"count_norm\\":' => '\\"count_norm\\":_1' [35], '1000000,\\n' => '1000000,\\n_1' [36], '\\"tax_name\\":' => '\\"tax_name\\":_1' [37], '\\"tax_rank\\":' => '\\"tax_rank\\":_1' [39], '},\\n' => '},\\n_1' [41], '{\\n' => '{\\n_2' [42], '\\"taxon\\":' => '\\"taxon\\":_2' [43], '\\"parent\\":' => '\\"parent\\":_2' [45], '\\"count\\":' => '\\"count\\":_2' [47], '\\"count_norm\\":' => '\\"count_norm\\":_2' [49], '\\"tax_name\\":' => '\\"tax_name\\":_2' [51], '\\"tax_rank\\":' => '\\"tax_rank\\":_2' [53], '},\\n' => '},\\n_2' [55], '{\\n' => '{\\n_3' [56], '\\"taxon\\":' => '\\"taxon\\":_3' [57], '\\"parent\\":' => '\\"parent\\":_3' [59], '\\"count\\":' => '\\"count\\":_3' [61], '\\"count_norm\\":' => '\\"count_norm\\":_3' [63], '\\"tax_name\\":' => '\\"tax_name\\":_3' [65], '\\"tax_rank\\":' => '\\"tax_rank\\":_3' [67], '},\\n' => '},\\n_3' [69], '{\\n' => '{\\n_4' [70], '\\"taxon\\":' => '\\"taxon\\":_4' [71], '\\"parent\\":' => '\\"parent\\":_4' [73], '481,\\n' => '481,\\n_1' [74], '\\"count\\":' => '\\"count\\":_4' [75], '\\"count_norm\\":' => '\\"count_norm\\":_4' [77], '\\"tax_name\\":' => '\\"tax_name\\":_4' [79], '\\"tax_rank\\":' => '\\"tax_rank\\":_4' [81], '\\"genus\\"\\n' => '\\"genus\\"\\n_1' [82], '},\\n' => '},\\n_4' [83], '{\\n' => '{\\n_5' [84], '\\"taxon\\":' => '\\"taxon\\":_5' [85], '\\"parent\\":' => '\\"parent\\":_5' [87], '482,\\n' => '482,\\n_1' [88], '\\"count\\":' => '\\"count\\":_5' [89], '\\"count_norm\\":' => '\\"count_norm\\":_5' [91], '\\"tax_name\\":' => '\\"tax_name\\":_5' [93], '\\"tax_rank\\":' => '\\"tax_rank\\":_5' [96], '},\\n' => '},\\n_5' [98], '{\\n' => '{\\n_6' [99], '\\"taxon\\":' => '\\"taxon\\":_6' [100], '\\"parent\\":' => '\\"parent\\":_6' [102], '482,\\n' => '482,\\n_2' [103], '\\"count\\":' => '\\"count\\":_6' [104], '\\"count_norm\\":' => '\\"count_norm\\":_6' [106], '\\"tax_name\\":' => '\\"tax_name\\":_6' [108], '\\"Neisseria' => '\\"Neisseria_1' [109], '\\"tax_rank\\":' => '\\"tax_rank\\":_6' [111], '\\"species\\"\\n' => '\\"species\\"\\n_1' [112], '},\\n' => '},\\n_6' [113], '{\\n' => '{\\n_7' [114], '\\"taxon\\":' => '\\"taxon\\":_7' [115], '\\"parent\\":' => '\\"parent\\":_7' [117], '\\"count\\":' => '\\"count\\":_7' [119], '\\"count_norm\\":' => '\\"count_norm\\":_7' [121], '\\"tax_name\\":' => '\\"tax_name\\":_7' [123], '\\"tax_rank\\":' => '\\"tax_rank\\":_7' [126], '\\"species\\"\\n' => '\\"species\\"\\n_2' [127], '},\\n' => '},\\n_7' [128], '{\\n' => '{\\n_8' [129], '\\"taxon\\":' => '\\"taxon\\":_8' [130], '\\"parent\\":' => '\\"parent\\":_8' [132], '\\"count\\":' => '\\"count\\":_8' [134], '\\"count_norm\\":' => '\\"count_norm\\":_8' [136], '\\"tax_name\\":' => '\\"tax_name\\":_8' [138], '\\"tax_rank\\":' => '\\"tax_rank\\":_8' [140], '\\"family\\"\\n' => '\\"family\\"\\n_1' [141], '},\\n' => '},\\n_8' [142], '{\\n' => '{\\n_9' [143], '\\"taxon\\":' => '\\"taxon\\":_9' [144], '\\"parent\\":' => '\\"parent\\":_9' [146], '543,\\n' => '543,\\n_1' [147], '\\"count\\":' => '\\"count\\":_9' [148], '5,\\n' => '5,\\n_1' [149], '\\"count_norm\\":' => '\\"count_norm\\":_9' [150], '1062,\\n' => '1062,\\n_1' [151], '\\"tax_name\\":' => '\\"tax_name\\":_9' [152], '\\"tax_rank\\":' => '\\"tax_rank\\":_9' [154], '\\"genus\\"\\n' => '\\"genus\\"\\n_2' [155], '},\\n' => '},\\n_9' [156], '{\\n' => '{\\n_10' [157], '\\"taxon\\":' => '\\"taxon\\":_10' [158], '\\"parent\\":' => '\\"parent\\":_10' [160], '\\"count\\":' => '\\"count\\":_10' [162], '\\"count_norm\\":' => '\\"count_norm\\":_10' [164], '\\"tax_name\\":' => '\\"tax_name\\":_10' [166], '\\"tax_rank\\":' => '\\"tax_rank\\":_10' [168], '\\"family\\"\\n' => '\\"family\\"\\n_2' [169], '},\\n' => '},\\n_10' [170], '{\\n' => '{\\n_11' [171], '\\"taxon\\":' => '\\"taxon\\":_11' [172], '\\"parent\\":' => '\\"parent\\":_11' [174], '712,\\n' => '712,\\n_1' [175], '\\"count\\":' => '\\"count\\":_11' [176], '\\"count_norm\\":' => '\\"count_norm\\":_11' [178], '\\"tax_name\\":' => '\\"tax_name\\":_11' [180], '\\"tax_rank\\":' => '\\"tax_rank\\":_11' [182], '\\"genus\\"\\n' => '\\"genus\\"\\n_3' [183], '},\\n' => '},\\n_11' [184], '{\\n' => '{\\n_12' [185], '\\"taxon\\":' => '\\"taxon\\":_12' [186], '\\"parent\\":' => '\\"parent\\":_12' [188], '724,\\n' => '724,\\n_1' [189], '\\"count\\":' => '\\"count\\":_12' [190], '\\"count_norm\\":' => '\\"count_norm\\":_12' [192], '\\"tax_name\\":' => '\\"tax_name\\":_12' [194], '\\"tax_rank\\":' => '\\"tax_rank\\":_12' [197], '\\"species\\"\\n' => '\\"species\\"\\n_3' [198], '},\\n' => '},\\n_12' [199], '{\\n' => '{\\n_13' [200], '\\"taxon\\":' => '\\"taxon\\":_13' [201], '\\"parent\\":' => '\\"parent\\":_13' [203], '724,\\n' => '724,\\n_2' [204], '\\"count\\":' => '\\"count\\":_13' [205], '\\"count_norm\\":' => '\\"count_norm\\":_13' [207], '\\"tax_name\\":' => '\\"tax_name\\":_13' [209], '\\"Haemophilus' => '\\"Haemophilus_1' [210], '\\"tax_rank\\":' => '\\"tax_rank\\":_13' [212], '\\"species\\"\\n' => '\\"species\\"\\n_4' [213], '},\\n' => '},\\n_13' [214], '{\\n' => '{\\n_14' [215], '\\"taxon\\":' => '\\"taxon\\":_14' [216], '\\"parent\\":' => '\\"parent\\":_14' [218], '\\"count\\":' => '\\"count\\":_14' [220], '4,\\n' => '4,\\n_1' [221], '\\"count_norm\\":' => '\\"count_norm\\":_14' [222], '849,\\n' => '849,\\n_1' [223], '\\"tax_name\\":' => '\\"tax_name\\":_14' [224], '\\"tax_rank\\":' => '\\"tax_rank\\":_14' [227], '\\"species\\"\\n' => '\\"species\\"\\n_5' [228], '},\\n' => '},\\n_14' [229], '{\\n' => '{\\n_15' [230], '\\"taxon\\":' => '\\"taxon\\":_15' [231], '\\"parent\\":' => '\\"parent\\":_15' [233], '416916,\\n' => '416916,\\n_1' [234], '\\"count\\":' => '\\"count\\":_15' [235], '\\"count_norm\\":' => '\\"count_norm\\":_15' [237], '\\"tax_name\\":' => '\\"tax_name\\":_15' [239], '\\"Aggregatibacter' => '\\"Aggregatibacter_1' [240], '\\"tax_rank\\":' => '\\"tax_rank\\":_15' [242], '\\"species\\"\\n' => '\\"species\\"\\n_6' [243], '},\\n' => '},\\n_15' [244], '{\\n' => '{\\n_16' [245], '\\"taxon\\":' => '\\"taxon\\":_16' [246], '\\"parent\\":' => '\\"parent\\":_16' [248], '\\"count\\":' => '\\"count\\":_16' [250], '\\"count_norm\\":' => '\\"count_norm\\":_16' [252], '\\"tax_name\\":' => '\\"tax_name\\":_16' [254], '\\"tax_rank\\":' => '\\"tax_rank\\":_16' [256], '\\"family\\"\\n' => '\\"family\\"\\n_3' [257], '},\\n' => '},\\n_16' [258], '{\\n' => '{\\n_17' [259], '\\"taxon\\":' => '\\"taxon\\":_17' [260], '\\"parent\\":' => '\\"parent\\":_17' [262], '815,\\n' => '815,\\n_1' [263], '\\"count\\":' => '\\"count\\":_17' [264], '33,\\n' => '33,\\n_1' [265], '\\"count_norm\\":' => '\\"count_norm\\":_17' [266], '7012,\\n' => '7012,\\n_1' [267], '\\"tax_name\\":' => '\\"tax_name\\":_17' [268], '\\"tax_rank\\":' => '\\"tax_rank\\":_17' [270], '\\"genus\\"\\n' => '\\"genus\\"\\n_4' [271], '},\\n' => '},\\n_17' [272], '{\\n' => '{\\n_18' [273], '\\"taxon\\":' => '\\"taxon\\":_18' [274], '\\"parent\\":' => '\\"parent\\":_18' [276], '816,\\n' => '816,\\n_1' [277], '\\"count\\":' => '\\"count\\":_18' [278], '\\"count_norm\\":' => '\\"count_norm\\":_18' [280], '\\"tax_name\\":' => '\\"tax_name\\":_18' [282], '\\"tax_rank\\":' => '\\"tax_rank\\":_18' [285], '\\"species\\"\\n' => '\\"species\\"\\n_7' [286], '},\\n' => '},\\n_18' [287], '{\\n' => '{\\n_19' [288], '\\"taxon\\":' => '\\"taxon\\":_19' [289], '\\"parent\\":' => '\\"parent\\":_19' [291], '\\"count\\":' => '\\"count\\":_19' [293], '21,\\n' => '21,\\n_1' [294], '\\"count_norm\\":' => '\\"count_norm\\":_19' [295], '4462,\\n' => '4462,\\n_1' [296], '\\"tax_name\\":' => '\\"tax_name\\":_19' [297], '\\"tax_rank\\":' => '\\"tax_rank\\":_19' [299], '\\"genus\\"\\n' => '\\"genus\\"\\n_5' [300], '},\\n' => '},\\n_19' [301], '{\\n' => '{\\n_20' [302], '\\"taxon\\":' => '\\"taxon\\":_20' [303], '\\"parent\\":' => '\\"parent\\":_20' [305], '\\"count\\":' => '\\"count\\":_20' [307], '\\"count_norm\\":' => '\\"count_norm\\":_20' [309], '\\"tax_name\\":' => '\\"tax_name\\":_20' [311], '\\"tax_rank\\":' => '\\"tax_rank\\":_20' [313], '\\"genus\\"\\n' => '\\”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Warning message:
“Duplicated column names deduplicated: '\\n      \\"count\\": 4706' => '\\n      \\"count\\": 4706_1' [14], '\\n      \\"count_norm\\": 1000000' => '\\n      \\"count_norm\\": 1000000_1' [15], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_1' [35], '\\n      \\"parent\\": 482' => '\\n      \\"parent\\": 482_1' [43], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_1' [47], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_2' [53], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_1' [59], '\\n      \\"count\\": 5' => '\\n      \\"count\\": 5_1' [62], '\\n      \\"count_norm\\": 1062' => '\\n      \\"count_norm\\": 1062_1' [63], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_2' [65], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_2' [71], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_3' [77], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_3' [83], '\\n      \\"parent\\": 724' => '\\n      \\"parent\\": 724_1' [85], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_4' [89], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_1' [92], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_1' [93], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_5' [95], '\\n      \\"parent\\": 416916' => '\\n      \\"parent\\": 416916_1' [97], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_6' [101], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_3' [107], '\\n      \\"count\\": 33' => '\\n      \\"count\\": 33_1' [110], '\\n      \\"count_norm\\": 7012' => '\\n      \\"count_norm\\": 7012_1' [111], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_4' [113], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_7' [119], '\\n      \\"count\\": 21' => '\\n      \\"count\\": 21_1' [122], '\\n      \\"count_norm\\": 4462' => '\\n      \\"count_norm\\": 4462_1' [123], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_5' [125], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_6' [131], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_7' [137], '\\n      \\"count\\": 11' => '\\n      \\"count\\": 11_1' [140], '\\n      \\"count_norm\\": 2337' => '\\n      \\"count_norm\\": 2337_1' [141], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_8' [143], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_8' [149], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_1' [152], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_1' [153], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_9' [155], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_9' [167], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_2' [170], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_2' [171], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_10' [173], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_1' [179], '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }' => '\\n      \\"tax_rank\\": \\"phylum\\"\\n    }_2' [191], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_4' [197], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_5' [203], '\\n      \\"count\\": 1518' => '\\n      \\"count\\": 1518_1' [206], '\\n      \\"count_norm\\": 322566' => '\\n      \\"count_norm\\": 322566_1' [207], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_10' [209], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_11' [215], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_1' [217], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_12' [221], '\\n      \\"parent\\": 1301' => '\\n      \\"parent\\": 1301_2' [223], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_1' [224], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_1' [225], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_13' [227], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_11' [233], '\\n      \\"count\\": 271' => '\\n      \\"count\\": 271_1' [236], '\\n      \\"count_norm\\": 57586' => '\\n      \\"count_norm\\": 57586_1' [237], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_2' [242], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_2' [243], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_6' [245], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_12' [251], '\\n      \\"count\\": 6' => '\\n      \\"count\\": 6_3' [254], '\\n      \\"count_norm\\": 1274' => '\\n      \\"count_norm\\": 1274_3' [255], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_13' [257], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_1' [263], '\\n      \\"count\\": 210' => '\\n      \\"count\\": 210_1' [266], '\\n      \\"count_norm\\": 44623' => '\\n      \\"count_norm\\": 44623_1' [267], '\\n      \\"tax_rank\\": \\"order\\"\\n    }' => '\\n      \\"tax_rank\\": \\"order\\"\\n    }_1' [269], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_14' [275], '\\n      \\"count\\": 38' => '\\n      \\"count\\": 38_1' [278], '\\n      \\"count_norm\\": 8074' => '\\n      \\"count_norm\\": 8074_1' [279], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_7' [281], '\\n      \\"count\\": 2' => '\\n      \\"count\\": 2_3' [284], '\\n      \\"count_norm\\": 424' => '\\n      \\"count_norm\\": 424_3' [285], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_15' [287], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_2' [290], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_2' [291], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_16' [293], '\\n      \\"parent\\": 838' => '\\n      \\"parent\\": 838_1' [295], '\\n      \\"count\\": 14' => '\\n      \\"count\\": 14_1' [296], '\\n      \\"count_norm\\": 2974' => '\\n      \\"count_norm\\": 2974_1' [297], '\\n      \\"tax_rank\\": \\"species\\"\\n    }' => '\\n      \\"tax_rank\\": \\"species\\"\\n    }_17' [299], '\\n      \\"parent\\": 1224' => '\\n      \\"parent\\": 1224_1' [301], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_2' [305], '\\n      \\"tax_rank\\": \\"genus\\"\\n    }' => '\\n      \\"tax_rank\\": \\"genus\\"\\n    }_14' [311], '\\n      \\"count\\": 4' => '\\n      \\"count\\": 4_3' [314], '\\n      \\"count_norm\\": 849' => '\\n      \\"count_norm\\": 849_3' [315], '\\n      \\"tax_rank\\": \\"class\\"\\n    }' => '\\n      \\"tax_rank\\": \\"class\\"\\n    }_3' [317], '\\n      \\"tax_rank\\": \\"family\\"\\n    }' => '\\n      \\"tax_rank\\": \\"family\\"\\n    }_8' [323], '\\n      \\"parent\\": 2' => ”Parsed with column specification:
cols(
  .default = col_character()
)
See spec(...) for full column specifications.
Error in FUN(X[[i]], ...): object 'taxon' not found
Traceback:

1. map(data_path, read_json_or_tbl_or_csv)
2. .f(.x[[i]], ...)
3. tryCatch({
 .     jsonlite::fromJSON(data_file)$ubiome_bacteriacounts %>% select(taxon, 
 .         parent, tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 . }, error = function(cond) {
 .     paste(cond)
 .     tryCatch({
 .         tab <- read_table2(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         assertthat::assert_that(ncol(tab) > 1)
 .         return(tab)
 .     }, error = function(bla) {
 .         paste(bla)
 .         csv <- read_csv(data_file) %>% select(taxon, parent, 
 .             tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .             ".", fixed = TRUE)[[1]][1])
 .         return(csv)
 .     })
 . })   # at line 3-25 of file <text>
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)
7. tryCatch({
 .     tab <- read_table2(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     assertthat::assert_that(ncol(tab) > 1)
 .     return(tab)
 . }, error = function(bla) {
 .     paste(bla)
 .     csv <- read_csv(data_file) %>% select(taxon, parent, tax_name, 
 .         tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
 .         ".", fixed = TRUE)[[1]][1])
 .     return(csv)
 . })   # at line 10-23 of file <text>
8. tryCatchList(expr, classes, parentenv, handlers)
9. tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. value[[3L]](cond)
11. read_csv(data_file) %>% select(taxon, parent, tax_name, tax_rank, 
  .     count, count_norm) %>% mutate(id = strsplit(basename(data_file), 
  .     ".", fixed = TRUE)[[1]][1])   # at line 19-21 of file <text>
12. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. eval(quote(`_fseq`(`_lhs`)), env, env)
15. `_fseq`(`_lhs`)
16. freduce(value, `_function_list`)
17. function_list[[i]](value)
18. select(., taxon, parent, tax_name, tax_rank, count, count_norm)
19. select.data.frame(., taxon, parent, tax_name, tax_rank, count, 
  .     count_norm)
20. select_vars(names(.data), !(!(!quos(...))))
21. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
22. map(.x[matches], .f, ...)
23. lapply(.x, .f, ...)
24. FUN(X[[i]], ...)
HadrienG commented 6 years ago

thanks for the traceback. Yes I did run it on openhumans.

I'll take a look, but I think it's because one of the json files has unquoted strings (which is not valid json)

The warnings can safely be ignored, maybe I should wrap the call in an invisible(). Or do you know a jupyter way to suppress the warnings for one cell? Like Rmarkdown {r cell_name warnings=FALSE}

gedankenstuecke commented 6 years ago

Great, thanks so much! I was just asking as otherwise different package versions might be to blame for some weird behavior.

I'm not sure how to best turn off the warnings (my own R skills are basically limited to making things look nice thanks to the ggplot2 universe). According to google options(warn=-1) might do the trick to turn the warnings off globally.