Closed andrewkho closed 7 years ago
Hi Andrew,
Are you able to provide/create an example dataset that reproduces the error you're seeing?
I don't see why ES v5.5.1 should be an issue. Likewise a differing number of fields as missing/empty fields in an index should could through as NA
.
Like I said, if you can reproduce the error for me that would be an enormous help.
Alex
Hi Alex, I'll try and create a minimum reproducible example, hopefully soon.
Thanks Andrew.
@andrewkho Any luck with that example or can I close this issue?
Sorry I haven't been able to make a small example. It is still an issue, however I am working around it by using the plain "elastic" package and wrote a simple DSL which is doing the job, so unfortunately I am not using elasticsearchr.
Thanks for getting back to me. In the absence of an example to debug, I'm going to close this issue. I'll re-open it if I run into anything that sounds similar.
I am also having this issue while importing a dataset of ~56k rows from elasticsearch
@andrewkho and @jwarnes, I ran into similar problems due to the nature of trying to wrangle nested lists into a data frame. This blog post really helps: http://zevross.com/blog/2015/02/12/using-r-to-download-and-parse-json-an-example-using-data-from-an-open-data-portal/
@jwarnes @hatdropper1977 Can either of you provide me with an example document or two, that I can ingest into Elasticsearch, to use for debugging?
Hypothetically, nested data frames shouldn't be an issue, as they ought to be 'flattened' using the flatten
function from jsonlite
.
@AlexIoannides I can't give you my data bc they include sensitive infomation.
Maybe these will work? https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
I did not have any luck w/ flatten, but ymmv.
Are you not able to create an artificial record that replicates the error you're observing?
I'm sorry, but I don't have time for trial and error. If you can give me something to pin a target on, however, I will make the time to take a look.
Ok - it may be a while though. @jwarnes do you have example data?
Not at the moment.
On Wed, May 23, 2018 at 4:58 PM, Alex Ioannides notifications@github.com wrote:
Are you not able to create an artificial record that replicates the error you're observing?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AlexIoannides/elasticsearchr/issues/28#issuecomment-391423153, or mute the thread https://github.com/notifications/unsubscribe-auth/AC-n9F2CVs1G3Kn5e0Ho2iZmTppDRdZmks5t1ZVGgaJpZM4PdCSD .
To clarify - I have no issue w/ the %search% command. If returns a data frame which includes columns of nested lists. The issue w/ my data is that it includes arbirtray levels and sub-levels of lists (each which may or may not have consistent length). So a simple 'flatten' does not work on it. I use the technique in the blog I linked above to pull the data I want.
Sorry for the confusion.
Once again, ElasticsearchR does its job very well.
If this helps, here is an example aggs query that works w/ ElasticsearchR (in that it succesfully returns a data frame - 100k plus rows w/o issue).
match_all_query <- query('{
"match_all": {}
}')
date_hist_w_servers <- aggs('{
"docs_over_time": {
"date_histogram": {
"field": "@timestamp",
"interval": "1m"
},
"aggs" : {
"servers" : {
"terms" : { "field" : "beat.hostname.keyword", "size" : 20 },
"aggs" : {
"the_max": {"max" : { "field": "system.memory.free" } }
}
}
}
}}')
df <- elastic(ELASTIC_API, ELASTIC_INDEX_NAME) %search% (match_all_query + date_hist_w_servers)
I just need to apply my own logic to flattening it, because it returns inconsistent nested lists (in terms of length and further sub-nests).
Thanks John.
How does this relate (if at all) to the original error, ............Error in rbind(deparse.level, ...)
?
I'm struggling to 'join the dots' here.
Hi ! Today i have exact this error when i am trying to download data from es. My index hasn't identical docs in index.
> elastic("http://elasticsearch:9200", "logs", "doc") %search% for_everything
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
I should use one index for one type of docs with similar fields?
Hi @MonaxGT,
Your index may not have identical docs, but that shouldn't (in theory) be a problem as all the interesting fields will have the same type.
To help me debug this, could you please give a few example docs from logs/doc
(that are not 'identical')?
Hi Alex,
Also having this problem. The issue is that rbind
enforces all bound columns to have the same names. This will happen anytime sparse data is held in the index. There are two issues here that I think are problematic:
l = list
l$record1 = data.frame(a=NA, b=NA, c=NA)
l$record2 = data.frame(a=NA, b=NA) # no data for col `c`
do.call(rbind, l) # line 404 of utils.R is the only instance of do.call(rbind, list)
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
%index%
an item that has an NA, that field does not get pushed for the record. So future %search%
actions on this index could lead to the do.call(rbind, list)
error. The code that manages this is here: https://github.com/AlexIoannides/elasticsearchr/blob/77ccadcc2a14fc834e0233b0c3bbc5496d7c90b7/R/utils.R#L404
Use dplyr::bind_rows
:bind_rows(l)
a b c
1 NA NA NA
2 NA NA NA
data.table::rbindlist(l, fill = TRUE)
data.table::rbindlist(l, fill=TRUE)
a b c
1: NA NA NA
2: NA NA NA
I'm going to submit a PR using dplyr
solution as per 1, above.
This fix has been merged in #52 and submitted to CRAN as v0.3.1.
When using the %search% operator, the method fails with the following error message:
............Error in rbind(deparse.level, ...) number of columns of arguments do not match
Elasticsearchr is from CRAN
I have confirmed the query works with the elastic package. I suspect the document have differing number of fields and perhaps this is causing the issue.
Or on the other hand it could be because of the elastic search version: 5.5.1