Open HadrienG opened 6 years ago
On this repo, I think! https://github.com/OpenHumans/oh-ubiome-source
I had some trouble getting it to run on my end:
data/
folder needs to exist already, otherwise it'll crash. But that's easy enough to fix. Then I had a problem with getting the reference data loaded in.
There's a lot of
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 <NA> 6 columns 7 columns 'data/37852.json' file 2 11 <NA> 6 columns 7 columns 'data/37852.json' row 3 14 <NA> 6 columns 7 columns 'data/37852.json' col 4 15 <NA> 6 columns 7 columns 'data/37852.json' expected 5 16 <NA> 6 columns 7 columns 'data/37852.json'
And in the end the data_container
isn't created. See the end of the error message here:
Just for fun I also tried loading my own data in the next steps but that also yielded an error:
Could you give me the whole traceback for the object taxon not found
? Not sure if it happens during the json or the csv parsing.
Sure, running
invisible(map2(jsons$download_url, paste0("data/", jsons$id, ".json"), download.file))
file.remove("data/37844.json") # invalid json
data_path <- dir("data", pattern = '.json', full.names = TRUE)
data_container <- map(data_path, read_json_or_tbl_or_csv)
I get the following output and data_container
isn't actually created.
I was wondering: did you run your code directly on notebooks.openhumans.org
?
Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“90 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 <NA> 6 columns 7 columns 'data/37852.json' file 2 11 <NA> 6 columns 7 columns 'data/37852.json' row 3 14 <NA> 6 columns 7 columns 'data/37852.json' col 4 15 <NA> 6 columns 7 columns 'data/37852.json' expected 5 16 <NA> 6 columns 7 columns 'data/37852.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_integer(),
parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“7 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 8 <NA> 6 columns 7 columns 'data/37870.json' file 2 9 <NA> 6 columns 7 columns 'data/37870.json' row 3 12 <NA> 6 columns 7 columns 'data/37870.json' col 4 13 <NA> 6 columns 7 columns 'data/37870.json' expected 5 15 <NA> 6 columns 8 columns 'data/37870.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_integer(),
parent = col_integer()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“9 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 3 <NA> 6 columns 7 columns 'data/37872.json' file 2 6 <NA> 6 columns 7 columns 'data/37872.json' row 3 9 <NA> 6 columns 7 columns 'data/37872.json' col 4 10 <NA> 6 columns 8 columns 'data/37872.json' expected 5 13 <NA> 6 columns 7 columns 'data/37872.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“16 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 3 <NA> 6 columns 8 columns 'data/37874.json' file 2 4 <NA> 6 columns 7 columns 'data/37874.json' row 3 7 <NA> 6 columns 8 columns 'data/37874.json' col 4 19 <NA> 6 columns 8 columns 'data/37874.json' expected 5 20 <NA> 6 columns 8 columns 'data/37874.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“103 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 8 <NA> 6 columns 7 columns 'data/37876.json' file 2 9 <NA> 6 columns 7 columns 'data/37876.json' row 3 10 <NA> 6 columns 7 columns 'data/37876.json' col 4 12 <NA> 6 columns 7 columns 'data/37876.json' expected 5 16 <NA> 6 columns 7 columns 'data/37876.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“78 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 <NA> 6 columns 7 columns 'data/37878.json' file 2 6 <NA> 6 columns 7 columns 'data/37878.json' row 3 18 <NA> 6 columns 7 columns 'data/37878.json' col 4 20 <NA> 6 columns 7 columns 'data/37878.json' expected 5 23 <NA> 6 columns 7 columns 'data/37878.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“73 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 <NA> 6 columns 7 columns 'data/37880.json' file 2 6 <NA> 6 columns 7 columns 'data/37880.json' row 3 19 <NA> 6 columns 7 columns 'data/37880.json' col 4 22 <NA> 6 columns 7 columns 'data/37880.json' expected 5 24 <NA> 6 columns 7 columns 'data/37880.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“87 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 7 <NA> 6 columns 7 columns 'data/37882.json' file 2 8 <NA> 6 columns 7 columns 'data/37882.json' row 3 9 <NA> 6 columns 7 columns 'data/37882.json' col 4 11 <NA> 6 columns 7 columns 'data/37882.json' expected 5 20 <NA> 6 columns 7 columns 'data/37882.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“24 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 5 <NA> 6 columns 7 columns 'data/37884.json' file 2 6 <NA> 6 columns 7 columns 'data/37884.json' row 3 9 <NA> 6 columns 7 columns 'data/37884.json' col 4 19 <NA> 6 columns 7 columns 'data/37884.json' expected 5 20 <NA> 6 columns 7 columns 'data/37884.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
`tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“70 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 11 <NA> 1 columns 2 columns 'data/37902.json' file 2 14 <NA> 1 columns 2 columns 'data/37902.json' row 3 15 <NA> 1 columns 2 columns 'data/37902.json' col 4 16 <NA> 1 columns 2 columns 'data/37902.json' expected 5 17 <NA> 1 columns 2 columns 'data/37902.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_integer(),
count_norm = col_integer(),
taxon = col_integer(),
parent = col_integer()
)
Parsed with column specification:
cols(
`tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“124 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 13 <NA> 1 columns 2 columns 'data/37904.json' file 2 16 <NA> 1 columns 2 columns 'data/37904.json' row 3 18 <NA> 1 columns 2 columns 'data/37904.json' col 4 21 <NA> 1 columns 2 columns 'data/37904.json' expected 5 22 <NA> 1 columns 2 columns 'data/37904.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_integer(),
count_norm = col_integer(),
taxon = col_integer(),
parent = col_integer()
)
Parsed with column specification:
cols(
`tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“79 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 10 <NA> 1 columns 3 columns 'data/37906.json' file 2 11 <NA> 1 columns 2 columns 'data/37906.json' row 3 16 <NA> 1 columns 2 columns 'data/37906.json' col 4 27 <NA> 1 columns 2 columns 'data/37906.json' expected 5 30 <NA> 1 columns 5 columns 'data/37906.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_integer(),
count_norm = col_integer(),
taxon = col_integer(),
parent = col_integer()
)
Parsed with column specification:
cols(
`tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“117 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 6 <NA> 1 columns 2 columns 'data/37908.json' file 2 10 <NA> 1 columns 2 columns 'data/37908.json' row 3 13 <NA> 1 columns 2 columns 'data/37908.json' col 4 14 <NA> 1 columns 2 columns 'data/37908.json' expected 5 15 <NA> 1 columns 2 columns 'data/37908.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_integer(),
count_norm = col_integer(),
taxon = col_integer(),
parent = col_integer()
)
Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“92 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 11 <NA> 6 columns 7 columns 'data/37914.json' file 2 12 <NA> 6 columns 7 columns 'data/37914.json' row 3 13 <NA> 6 columns 7 columns 'data/37914.json' col 4 17 <NA> 6 columns 7 columns 'data/37914.json' expected 5 24 <NA> 6 columns 7 columns 'data/37914.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
`tax_name,tax_rank,count,count_norm,taxon,parent` = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“96 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 10 <NA> 1 columns 2 columns 'data/37922.json' file 2 11 <NA> 1 columns 2 columns 'data/37922.json' row 3 12 <NA> 1 columns 2 columns 'data/37922.json' col 4 13 <NA> 1 columns 2 columns 'data/37922.json' expected 5 17 <NA> 1 columns 2 columns 'data/37922.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_integer(),
count_norm = col_integer(),
taxon = col_integer(),
parent = col_integer()
)
Parsed with column specification:
cols(
tax_name = col_character(),
tax_rank = col_character(),
count = col_character(),
count_norm = col_character(),
taxon = col_character(),
parent = col_character()
)
Warning message in rbind(names(probs), probs_f):
“number of columns of result is not a multiple of vector length (arg 1)”Warning message:
“76 parsing failures.
row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 9 <NA> 6 columns 7 columns 'data/37944.json' file 2 12 <NA> 6 columns 7 columns 'data/37944.json' row 3 13 <NA> 6 columns 7 columns 'data/37944.json' col 4 14 <NA> 6 columns 7 columns 'data/37944.json' expected 5 19 <NA> 6 columns 7 columns 'data/37944.json'
... ................. ... ................................................... ........ ................................................... ...... ................................................... .... ................................................... ... ................................................... ... ................................................... ........ ...................................................
See problems(...) for more details.
”Warning message:
“Duplicated column names deduplicated: '{\\n' => '{\\n_1' [28], '\\"taxon\\":' => '\\"taxon\\":_1' [29], '\\"parent\\":' => '\\"parent\\":_1' [31], '\\"count\\":' => '\\"count\\":_1' [33], '4706,\\n' => '4706,\\n_1' [34], '\\"count_norm\\":' => '\\"count_norm\\":_1' [35], '1000000,\\n' => '1000000,\\n_1' [36], '\\"tax_name\\":' => '\\"tax_name\\":_1' [37], '\\"tax_rank\\":' => '\\"tax_rank\\":_1' [39], '},\\n' => '},\\n_1' [41], '{\\n' => '{\\n_2' [42], '\\"taxon\\":' => '\\"taxon\\":_2' [43], '\\"parent\\":' => '\\"parent\\":_2' [45], '\\"count\\":' => '\\"count\\":_2' [47], '\\"count_norm\\":' => '\\"count_norm\\":_2' [49], '\\"tax_name\\":' => '\\"tax_name\\":_2' [51], '\\"tax_rank\\":' => '\\"tax_rank\\":_2' [53], '},\\n' => '},\\n_2' [55], '{\\n' => '{\\n_3' [56], '\\"taxon\\":' => '\\"taxon\\":_3' [57], '\\"parent\\":' => '\\"parent\\":_3' [59], '\\"count\\":' => '\\"count\\":_3' [61], '\\"count_norm\\":' => '\\"count_norm\\":_3' [63], '\\"tax_name\\":' => '\\"tax_name\\":_3' [65], '\\"tax_rank\\":' => '\\"tax_rank\\":_3' [67], '},\\n' => '},\\n_3' [69], '{\\n' => '{\\n_4' [70], '\\"taxon\\":' => '\\"taxon\\":_4' [71], '\\"parent\\":' => '\\"parent\\":_4' [73], '481,\\n' => '481,\\n_1' [74], '\\"count\\":' => '\\"count\\":_4' [75], '\\"count_norm\\":' => '\\"count_norm\\":_4' [77], '\\"tax_name\\":' => '\\"tax_name\\":_4' [79], '\\"tax_rank\\":' => '\\"tax_rank\\":_4' [81], '\\"genus\\"\\n' => '\\"genus\\"\\n_1' [82], '},\\n' => '},\\n_4' [83], '{\\n' => '{\\n_5' [84], '\\"taxon\\":' => '\\"taxon\\":_5' [85], '\\"parent\\":' => '\\"parent\\":_5' [87], '482,\\n' => '482,\\n_1' [88], '\\"count\\":' => '\\"count\\":_5' [89], '\\"count_norm\\":' => '\\"count_norm\\":_5' [91], '\\"tax_name\\":' => '\\"tax_name\\":_5' [93], '\\"tax_rank\\":' => '\\"tax_rank\\":_5' [96], '},\\n' => '},\\n_5' [98], '{\\n' => '{\\n_6' [99], '\\"taxon\\":' => '\\"taxon\\":_6' [100], '\\"parent\\":' => '\\"parent\\":_6' [102], '482,\\n' => '482,\\n_2' [103], '\\"count\\":' => '\\"count\\":_6' [104], '\\"count_norm\\":' => '\\"count_norm\\":_6' [106], '\\"tax_name\\":' => '\\"tax_name\\":_6' [108], '\\"Neisseria' => '\\"Neisseria_1' [109], '\\"tax_rank\\":' => '\\"tax_rank\\":_6' [111], '\\"species\\"\\n' => '\\"species\\"\\n_1' [112], '},\\n' => '},\\n_6' [113], '{\\n' => '{\\n_7' [114], '\\"taxon\\":' => '\\"taxon\\":_7' [115], '\\"parent\\":' => '\\"parent\\":_7' [117], '\\"count\\":' => '\\"count\\":_7' [119], '\\"count_norm\\":' => '\\"count_norm\\":_7' [121], '\\"tax_name\\":' => '\\"tax_name\\":_7' [123], '\\"tax_rank\\":' => '\\"tax_rank\\":_7' [126], '\\"species\\"\\n' => '\\"species\\"\\n_2' [127], '},\\n' => '},\\n_7' [128], '{\\n' => '{\\n_8' [129], '\\"taxon\\":' => '\\"taxon\\":_8' [130], '\\"parent\\":' => '\\"parent\\":_8' [132], '\\"count\\":' => '\\"count\\":_8' [134], '\\"count_norm\\":' => '\\"count_norm\\":_8' [136], '\\"tax_name\\":' => '\\"tax_name\\":_8' [138], '\\"tax_rank\\":' => '\\"tax_rank\\":_8' [140], '\\"family\\"\\n' => '\\"family\\"\\n_1' [141], '},\\n' => '},\\n_8' [142], '{\\n' => '{\\n_9' [143], '\\"taxon\\":' => '\\"taxon\\":_9' [144], '\\"parent\\":' => '\\"parent\\":_9' [146], '543,\\n' => '543,\\n_1' [147], '\\"count\\":' => '\\"count\\":_9' [148], '5,\\n' => '5,\\n_1' [149], '\\"count_norm\\":' => '\\"count_norm\\":_9' [150], '1062,\\n' => '1062,\\n_1' [151], '\\"tax_name\\":' => '\\"tax_name\\":_9' [152], '\\"tax_rank\\":' => '\\"tax_rank\\":_9' [154], '\\"genus\\"\\n' => '\\"genus\\"\\n_2' [155], '},\\n' => '},\\n_9' [156], '{\\n' => '{\\n_10' [157], '\\"taxon\\":' => '\\"taxon\\":_10' [158], '\\"parent\\":' => '\\"parent\\":_10' [160], '\\"count\\":' => '\\"count\\":_10' [162], '\\"count_norm\\":' => '\\"count_norm\\":_10' [164], '\\"tax_name\\":' => '\\"tax_name\\":_10' [166], '\\"tax_rank\\":' => '\\"tax_rank\\":_10' [168], '\\"family\\"\\n' => '\\"family\\"\\n_2' [169], '},\\n' => '},\\n_10' [170], '{\\n' => '{\\n_11' [171], '\\"taxon\\":' => '\\"taxon\\":_11' [172], '\\"parent\\":' => '\\"parent\\":_11' [174], '712,\\n' => '712,\\n_1' [175], '\\"count\\":' => '\\"count\\":_11' [176], '\\"count_norm\\":' => '\\"count_norm\\":_11' [178], '\\"tax_name\\":' => '\\"tax_name\\":_11' [180], '\\"tax_rank\\":' => '\\"tax_rank\\":_11' [182], '\\"genus\\"\\n' => '\\"genus\\"\\n_3' [183], '},\\n' => '},\\n_11' [184], '{\\n' => '{\\n_12' [185], '\\"taxon\\":' => '\\"taxon\\":_12' [186], '\\"parent\\":' => '\\"parent\\":_12' [188], '724,\\n' => '724,\\n_1' [189], '\\"count\\":' => '\\"count\\":_12' [190], '\\"count_norm\\":' => '\\"count_norm\\":_12' [192], '\\"tax_name\\":' => '\\"tax_name\\":_12' [194], '\\"tax_rank\\":' => '\\"tax_rank\\":_12' [197], '\\"species\\"\\n' => '\\"species\\"\\n_3' [198], '},\\n' => '},\\n_12' [199], '{\\n' => '{\\n_13' [200], '\\"taxon\\":' => '\\"taxon\\":_13' [201], '\\"parent\\":' => '\\"parent\\":_13' [203], '724,\\n' => '724,\\n_2' [204], '\\"count\\":' => '\\"count\\":_13' [205], '\\"count_norm\\":' => '\\"count_norm\\":_13' [207], '\\"tax_name\\":' => '\\"tax_name\\":_13' [209], '\\"Haemophilus' => '\\"Haemophilus_1' [210], '\\"tax_rank\\":' => '\\"tax_rank\\":_13' [212], '\\"species\\"\\n' => '\\"species\\"\\n_4' [213], '},\\n' => '},\\n_13' [214], '{\\n' => '{\\n_14' [215], '\\"taxon\\":' => '\\"taxon\\":_14' [216], '\\"parent\\":' => '\\"parent\\":_14' [218], '\\"count\\":' => '\\"count\\":_14' [220], '4,\\n' => '4,\\n_1' [221], '\\"count_norm\\":' => '\\"count_norm\\":_14' [222], '849,\\n' => '849,\\n_1' [223], '\\"tax_name\\":' => '\\"tax_name\\":_14' [224], '\\"tax_rank\\":' => '\\"tax_rank\\":_14' [227], '\\"species\\"\\n' => '\\"species\\"\\n_5' [228], '},\\n' => '},\\n_14' [229], '{\\n' => '{\\n_15' [230], '\\"taxon\\":' => '\\"taxon\\":_15' [231], '\\"parent\\":' => '\\"parent\\":_15' [233], '416916,\\n' => '416916,\\n_1' [234], '\\"count\\":' => '\\"count\\":_15' [235], '\\"count_norm\\":' => '\\"count_norm\\":_15' [237], '\\"tax_name\\":' => '\\"tax_name\\":_15' [239], '\\"Aggregatibacter' => '\\"Aggregatibacter_1' [240], '\\"tax_rank\\":' => '\\"tax_rank\\":_15' [242], '\\"species\\"\\n' => '\\"species\\"\\n_6' [243], '},\\n' => '},\\n_15' [244], '{\\n' => '{\\n_16' [245], '\\"taxon\\":' => '\\"taxon\\":_16' [246], '\\"parent\\":' => '\\"parent\\":_16' [248], '\\"count\\":' => '\\"count\\":_16' [250], '\\"count_norm\\":' => '\\"count_norm\\":_16' [252], '\\"tax_name\\":' => '\\"tax_name\\":_16' [254], '\\"tax_rank\\":' => '\\"tax_rank\\":_16' [256], '\\"family\\"\\n' => '\\"family\\"\\n_3' [257], '},\\n' => '},\\n_16' [258], '{\\n' => '{\\n_17' [259], '\\"taxon\\":' => '\\"taxon\\":_17' [260], '\\"parent\\":' => '\\"parent\\":_17' [262], '815,\\n' => '815,\\n_1' [263], '\\"count\\":' => '\\"count\\":_17' [264], '33,\\n' => '33,\\n_1' [265], '\\"count_norm\\":' => '\\"count_norm\\":_17' [266], '7012,\\n' => '7012,\\n_1' [267], '\\"tax_name\\":' => '\\"tax_name\\":_17' [268], '\\"tax_rank\\":' => '\\"tax_rank\\":_17' [270], '\\"genus\\"\\n' => '\\"genus\\"\\n_4' [271], '},\\n' => '},\\n_17' [272], '{\\n' => '{\\n_18' [273], '\\"taxon\\":' => '\\"taxon\\":_18' [274], '\\"parent\\":' => '\\"parent\\":_18' [276], '816,\\n' => '816,\\n_1' [277], '\\"count\\":' => '\\"count\\":_18' [278], '\\"count_norm\\":' => '\\"count_norm\\":_18' [280], '\\"tax_name\\":' => '\\"tax_name\\":_18' [282], '\\"tax_rank\\":' => '\\"tax_rank\\":_18' [285], '\\"species\\"\\n' => '\\"species\\"\\n_7' [286], '},\\n' => '},\\n_18' [287], '{\\n' => '{\\n_19' [288], '\\"taxon\\":' => '\\"taxon\\":_19' [289], '\\"parent\\":' => '\\"parent\\":_19' [291], '\\"count\\":' => '\\"count\\":_19' [293], '21,\\n' => '21,\\n_1' [294], '\\"count_norm\\":' => '\\"count_norm\\":_19' [295], '4462,\\n' => '4462,\\n_1' [296], '\\"tax_name\\":' => '\\"tax_name\\":_19' [297], '\\"tax_rank\\":' => '\\"tax_rank\\":_19' [299], '\\"genus\\"\\n' => '\\"genus\\"\\n_5' [300], '},\\n' => '},\\n_19' [301], '{\\n' => '{\\n_20' [302], '\\"taxon\\":' => '\\"taxon\\":_20' [303], '\\"parent\\":' => '\\"parent\\":_20' [305], '\\"count\\":' => '\\"count\\":_20' [307], '\\"count_norm\\":' => '\\"count_norm\\":_20' [309], '\\"tax_name\\":' => '\\"tax_name\\":_20' [311], '\\"tax_rank\\":' => '\\"tax_rank\\":_20' [313], '\\"genus\\"\\n' => '\\”Parsed with column specification:
cols(
.default = col_character()
)
See spec(...) for full column specifications.
Warning message:
“Duplicated column names deduplicated: '\\n \\"count\\": 4706' => '\\n \\"count\\": 4706_1' [14], '\\n \\"count_norm\\": 1000000' => '\\n \\"count_norm\\": 1000000_1' [15], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_1' [35], '\\n \\"parent\\": 482' => '\\n \\"parent\\": 482_1' [43], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_1' [47], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_2' [53], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_1' [59], '\\n \\"count\\": 5' => '\\n \\"count\\": 5_1' [62], '\\n \\"count_norm\\": 1062' => '\\n \\"count_norm\\": 1062_1' [63], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_2' [65], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_2' [71], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_3' [77], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_3' [83], '\\n \\"parent\\": 724' => '\\n \\"parent\\": 724_1' [85], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_4' [89], '\\n \\"count\\": 4' => '\\n \\"count\\": 4_1' [92], '\\n \\"count_norm\\": 849' => '\\n \\"count_norm\\": 849_1' [93], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_5' [95], '\\n \\"parent\\": 416916' => '\\n \\"parent\\": 416916_1' [97], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_6' [101], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_3' [107], '\\n \\"count\\": 33' => '\\n \\"count\\": 33_1' [110], '\\n \\"count_norm\\": 7012' => '\\n \\"count_norm\\": 7012_1' [111], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_4' [113], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_7' [119], '\\n \\"count\\": 21' => '\\n \\"count\\": 21_1' [122], '\\n \\"count_norm\\": 4462' => '\\n \\"count_norm\\": 4462_1' [123], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_5' [125], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_6' [131], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_7' [137], '\\n \\"count\\": 11' => '\\n \\"count\\": 11_1' [140], '\\n \\"count_norm\\": 2337' => '\\n \\"count_norm\\": 2337_1' [141], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_8' [143], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_8' [149], '\\n \\"count\\": 2' => '\\n \\"count\\": 2_1' [152], '\\n \\"count_norm\\": 424' => '\\n \\"count_norm\\": 424_1' [153], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_9' [155], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_9' [167], '\\n \\"count\\": 2' => '\\n \\"count\\": 2_2' [170], '\\n \\"count_norm\\": 424' => '\\n \\"count_norm\\": 424_2' [171], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_10' [173], '\\n \\"tax_rank\\": \\"phylum\\"\\n }' => '\\n \\"tax_rank\\": \\"phylum\\"\\n }_1' [179], '\\n \\"tax_rank\\": \\"phylum\\"\\n }' => '\\n \\"tax_rank\\": \\"phylum\\"\\n }_2' [191], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_4' [197], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_5' [203], '\\n \\"count\\": 1518' => '\\n \\"count\\": 1518_1' [206], '\\n \\"count_norm\\": 322566' => '\\n \\"count_norm\\": 322566_1' [207], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_10' [209], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_11' [215], '\\n \\"parent\\": 1301' => '\\n \\"parent\\": 1301_1' [217], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_12' [221], '\\n \\"parent\\": 1301' => '\\n \\"parent\\": 1301_2' [223], '\\n \\"count\\": 6' => '\\n \\"count\\": 6_1' [224], '\\n \\"count_norm\\": 1274' => '\\n \\"count_norm\\": 1274_1' [225], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_13' [227], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_11' [233], '\\n \\"count\\": 271' => '\\n \\"count\\": 271_1' [236], '\\n \\"count_norm\\": 57586' => '\\n \\"count_norm\\": 57586_1' [237], '\\n \\"count\\": 6' => '\\n \\"count\\": 6_2' [242], '\\n \\"count_norm\\": 1274' => '\\n \\"count_norm\\": 1274_2' [243], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_6' [245], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_12' [251], '\\n \\"count\\": 6' => '\\n \\"count\\": 6_3' [254], '\\n \\"count_norm\\": 1274' => '\\n \\"count_norm\\": 1274_3' [255], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_13' [257], '\\n \\"tax_rank\\": \\"class\\"\\n }' => '\\n \\"tax_rank\\": \\"class\\"\\n }_1' [263], '\\n \\"count\\": 210' => '\\n \\"count\\": 210_1' [266], '\\n \\"count_norm\\": 44623' => '\\n \\"count_norm\\": 44623_1' [267], '\\n \\"tax_rank\\": \\"order\\"\\n }' => '\\n \\"tax_rank\\": \\"order\\"\\n }_1' [269], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_14' [275], '\\n \\"count\\": 38' => '\\n \\"count\\": 38_1' [278], '\\n \\"count_norm\\": 8074' => '\\n \\"count_norm\\": 8074_1' [279], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_7' [281], '\\n \\"count\\": 2' => '\\n \\"count\\": 2_3' [284], '\\n \\"count_norm\\": 424' => '\\n \\"count_norm\\": 424_3' [285], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_15' [287], '\\n \\"count\\": 4' => '\\n \\"count\\": 4_2' [290], '\\n \\"count_norm\\": 849' => '\\n \\"count_norm\\": 849_2' [291], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_16' [293], '\\n \\"parent\\": 838' => '\\n \\"parent\\": 838_1' [295], '\\n \\"count\\": 14' => '\\n \\"count\\": 14_1' [296], '\\n \\"count_norm\\": 2974' => '\\n \\"count_norm\\": 2974_1' [297], '\\n \\"tax_rank\\": \\"species\\"\\n }' => '\\n \\"tax_rank\\": \\"species\\"\\n }_17' [299], '\\n \\"parent\\": 1224' => '\\n \\"parent\\": 1224_1' [301], '\\n \\"tax_rank\\": \\"class\\"\\n }' => '\\n \\"tax_rank\\": \\"class\\"\\n }_2' [305], '\\n \\"tax_rank\\": \\"genus\\"\\n }' => '\\n \\"tax_rank\\": \\"genus\\"\\n }_14' [311], '\\n \\"count\\": 4' => '\\n \\"count\\": 4_3' [314], '\\n \\"count_norm\\": 849' => '\\n \\"count_norm\\": 849_3' [315], '\\n \\"tax_rank\\": \\"class\\"\\n }' => '\\n \\"tax_rank\\": \\"class\\"\\n }_3' [317], '\\n \\"tax_rank\\": \\"family\\"\\n }' => '\\n \\"tax_rank\\": \\"family\\"\\n }_8' [323], '\\n \\"parent\\": 2' => ”Parsed with column specification:
cols(
.default = col_character()
)
See spec(...) for full column specifications.
Error in FUN(X[[i]], ...): object 'taxon' not found
Traceback:
1. map(data_path, read_json_or_tbl_or_csv)
2. .f(.x[[i]], ...)
3. tryCatch({
. jsonlite::fromJSON(data_file)$ubiome_bacteriacounts %>% select(taxon,
. parent, tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1])
. }, error = function(cond) {
. paste(cond)
. tryCatch({
. tab <- read_table2(data_file) %>% select(taxon, parent,
. tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1])
. assertthat::assert_that(ncol(tab) > 1)
. return(tab)
. }, error = function(bla) {
. paste(bla)
. csv <- read_csv(data_file) %>% select(taxon, parent,
. tax_name, tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1])
. return(csv)
. })
. }) # at line 3-25 of file <text>
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)
7. tryCatch({
. tab <- read_table2(data_file) %>% select(taxon, parent, tax_name,
. tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1])
. assertthat::assert_that(ncol(tab) > 1)
. return(tab)
. }, error = function(bla) {
. paste(bla)
. csv <- read_csv(data_file) %>% select(taxon, parent, tax_name,
. tax_rank, count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1])
. return(csv)
. }) # at line 10-23 of file <text>
8. tryCatchList(expr, classes, parentenv, handlers)
9. tryCatchOne(expr, names, parentenv, handlers[[1L]])
10. value[[3L]](cond)
11. read_csv(data_file) %>% select(taxon, parent, tax_name, tax_rank,
. count, count_norm) %>% mutate(id = strsplit(basename(data_file),
. ".", fixed = TRUE)[[1]][1]) # at line 19-21 of file <text>
12. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
13. eval(quote(`_fseq`(`_lhs`)), env, env)
14. eval(quote(`_fseq`(`_lhs`)), env, env)
15. `_fseq`(`_lhs`)
16. freduce(value, `_function_list`)
17. function_list[[i]](value)
18. select(., taxon, parent, tax_name, tax_rank, count, count_norm)
19. select.data.frame(., taxon, parent, tax_name, tax_rank, count,
. count_norm)
20. select_vars(names(.data), !(!(!quos(...))))
21. map_if(ind_list, !is_helper, eval_tidy, data = names_list)
22. map(.x[matches], .f, ...)
23. lapply(.x, .f, ...)
24. FUN(X[[i]], ...)
thanks for the traceback. Yes I did run it on openhumans.
I'll take a look, but I think it's because one of the json files has unquoted strings (which is not valid json)
The warnings can safely be ignored, maybe I should wrap the call in an invisible()
. Or do you know a jupyter way to suppress the warnings for one cell? Like Rmarkdown {r cell_name warnings=FALSE}
Great, thanks so much! I was just asking as otherwise different package versions might be to blame for some weird behavior.
I'm not sure how to best turn off the warnings (my own R skills are basically limited to making things look nice thanks to the ggplot2
universe). According to google options(warn=-1)
might do the trick to turn the warnings off globally.
Ready for testing
Additional remarks
Some of the ubiome json files are not valid json, or are in other format (csv and tab delimited). Ideally there should be a data validation when accepting the upload files (I can file an issue in the repo if you want). Anyway the r code in the notebook should handle the different formats 🎉