Closed milosgajdos closed 8 years ago
Thanks for this. The code was not setup to accommodate httr's switch from XML to xml2 as the default XML parser. I believe this is now fixed. Can you confirm that it is now working for you, @milosgajdos83?
Excellent! Now, the request does go through without any error and returns s3_object
.
It seems this has done the trick :+1:
Now, a quick unrelated R-n00b question (if u don't mind) - I'm only just beginning to learn R :-) How do I access each bucket item individually?
From what I can see in the code aws.s3::getbucket(bucket = "bucket_name")
returns s3_object
with the following attributes:
> attributes(my_bucket)
$names
[1] "Name" "Prefix" "Marker" "MaxKeys" "IsTruncated"
[6] "Contents" "Contents" "Contents" "Contents"
$class
[1] "s3_bucket"
Now I can see the Contents
attribute contains the actual objects stored in the S3 bucket in the list?:
> summary(my_bucket)
Length Class Mode
Name 1 -none- character
Prefix 0 -none- NULL
Marker 0 -none- NULL
MaxKeys 1 -none- character
IsTruncated 1 -none- character
Contents 6 s3_object list
Contents 6 s3_object list
Contents 6 s3_object list
Contents 6 s3_object list
Is there any way how I can easily iterate through each of the objects (even better, through every file object?). I thought I'd be able to access them via list indexes, but it seems ALL of the S3 objects are indexed with index nr 1? i.e. they're being overridden with the last S3 object retrieved and thus can't be iterated over ?
> summary(my_bucket$Contents)
Length Class Mode
Key 1 -none- character
LastModified 1 -none- character
ETag 1 -none- character
Size 1 -none- character
Owner 2 -none- list
StorageClass 1 -none- character
> attributes(my_bucket$Contents)
$names
[1] "Key" "LastModified" "ETag" "Size" "Owner"
[6] "StorageClass"
$class
[1] "s3_object"
>
I'm totally sure I'm missing something, so I'd really appreciated a bit of help :-) Lastly huge thanks for the awesome Cloudyr work!
I just got the same error. Just installed from github.
> library(aws.s3)
> Sys.setenv("AWS_ACCESS_KEY_ID" = "xxx",
+ "AWS_SECRET_ACCESS_KEY" = "yyy",
+ "AWS_DEFAULT_REGION" = "us-east-1")
> bucketlist()
No encoding supplied: defaulting to UTF-8. Error in UseMethod("xmlSApply") : no applicable method for 'xmlSApply' applied to an object of class "c('xml_document', 'xml_node')"
httr version 1.1.0 xml2 version 0.1.2 r version 3.2.4
It is accessing the server. Seems to be a problem parsing the list.
UPDATE: apparently I was using the "latest stable version" and not the most current version from github (requiring ghit). So problem is solved. Just have to use the right development version!
@milosgajdos83 I think you want getobject()? Here's an example of iterating over a bucket to get all objects in the bucket: https://github.com/ropensci/drat/blob/gh-pages/parse_s3_logs.R#L19-L26, and then iterating over that object list to download each object: https://github.com/ropensci/drat/blob/gh-pages/parse_s3_logs.R#L30-L34 . @leeper can probably comment if there's a better way, but I believe the S3 API is pretty low level. Perhaps we can abstract this into some helper functions in the R package for these common tasks.
@leeper From @markdanese 's error it looks like we must using the httr::content()
call to parse XML without specifying the parser? I'd try a PR but I'm just not spotting the content()
call.
As you probably know, httr
recently dropped XML
in favor of xml2
as the default XML parser, so when relying on the automatically detected httr::content()
the function is returning an xml_document
mentioned in Mark's error message (the xml2
object class), rather than an xmlDocument
that the old XML::xmlSApply
needs.
replacing the current call to content
with xmlParse(httr::content(response, as="text"))
would do the trick if only I could find it. alternately we might just want to move over to xml2
...
@cboettig -- I just updated the issue. I think it is solved, but it is only in the development version.
I am pushing this through today with a number of breaking changes. We are abandoning XML in favor of xml2 and get_bucket()
(new function, replacing getbucket()
) is going to return a list of object of class "s3_object". The bucket metadata that used to be part of a "s3_bucket" object is now stored in that object's attributes. As such, if you wanted to, for example, get every object out of a bucket you should be able to do:
library("aws.s3")
b <- get_bucket(bucket = "mybucket")
# load objects as raw vectors in memory
lapply(b, get_object)
# save all objects locally to specified vector of file names
mapply(save_object, object = b, file = paste0("file", seq_along(b), ".txt"))
Hope this solves everyones' issues.
I've installed this wonderfully looking package and tried to retrieve a list of files in one of my S3 buckets, but I seem to be getting following error:
> aws.s3::getbucket(bucket = "kaggle.ml.data") No encoding supplied: defaulting to UTF-8. Error in UseMethod("xmlSApply") : no applicable method for 'xmlSApply' applied to an object of class "c('xml_document', 'xml_node')"
Platform: Mac OS X 10.11.3 (El Captain)
R version:
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin13.4.0 (64-bit)