RevolutionAnalytics / AzureML

An R interface to AzureML(https://studio.azureml.net/) experiments, datasets, and web services.
Other
47 stars 22 forks source link

The subset(.) function does not work properly with a Datasets object argument #103

Closed QuantDevHacks closed 8 years ago

QuantDevHacks commented 8 years ago

If a Datasets object containing more than one actual datasets is passed to the subset(.) function, a warning is returned, and the return object contains NULL (ie, character(0)) members.

Begin Simple repro:

names <- c("Energy Efficiency Regression data", "Blood donation data", "MNIST Test 10k 28x28 dense")
d <- datasets(ws)
z <- subset(d, Name == names)

End repro code

The warning returned is:

_Warning message:

In Name == names :
  longer object length is not a multiple of shorter object length_

Examining the members of the z object, we get:

_> z$Name
character(0)
> z$DataTypeId
character(0)
> z$Description
character(0)_

etc...

I have not been able to locate an overload of the function specifically for Datasets object input arguments, so this may be the cause; however, I will verify this is the case, as so far I have just used

grep subset *|grep function

in the code base.

QuantDevHacks commented 8 years ago

Steve W reports that on his environment, there is no warning and all seems to run as expected. He suspects I have a "dirty environment". I will also check on a separate machine, and then investigate.

sfweller commented 8 years ago

We determined by checking the problem on Dan's machine that two datasets he had uploaded to ML Studio were the culprit. After removing those objects everything worked as it should.

QuantDevHacks commented 8 years ago

As a result (see sfweller above) we can mark this issue as done, but open up a new issue for the particular problem we found.

QuantDevHacks commented 8 years ago

Reopening this bug -- seems the issue may not have been solved after all -- back under investigation. Even after removing the additional datasets, plus connecting using different credentials to an AzureML Studio environment where additional datasets were never present, the subset(.) function returns and object with zero Name attributes.

andrie commented 8 years ago

As far as I can tell, this isn't a bug, but user error.

You should use %in%, instead of ==.

subset(d, Name == names)
                 Name DataTypeId  Size ...
1 Blood donation data GenericCSV 12769 ...
----------------------------------------------

As opposed to:

subset(d, Name %in% names)
                               Name DataTypeId     Size ...
1               Blood donation data GenericCSV    12769 ...
2 Energy Efficiency Regression data GenericCSV    40831 ...
3        MNIST Test 10k 28x28 dense GenericTSV 18303260 ...