JuliaInterop / RCall.jl

Call R from Julia
Other
318 stars 59 forks source link

Don't specify `AbstractArray` for column types in DataFrame `rcopy` #528

Open asinghvi17 opened 1 month ago

asinghvi17 commented 1 month ago

This allows other packages to hook into the conversion system when loading R DataFrames.

My usecase here is to provide seamless interop between R and Julia for sf dataframes, so people can take advantage of all of R's tooling without having to re-implement it in Julia.

An example of the use is that by defining rcopytype and rcopy for sfc_MULTIPOLYGON, which is a simple features collection of multipolygons, I can get those geometries converted to the relevant GeoInterface.jl geometry representation for free.

asinghvi17 commented 1 month ago

It looks like this change breaks single-row DataFrame conversion. Is there another way to get around this, maybe a maybe_wrap_in_array function or something? This functionality is pretty nice because it allows known types in R to be converted seamlessly if they're in data tables.

mkitti commented 1 month ago

I'm looking. I can merge. Why is CI in such a poor state though?

asinghvi17 commented 1 month ago

This particular change breaks one of the CI examples, specifically:

rcopy(R"data.frame(a=1,b=2)")

which internally receives a Float64 for each column rather than a vector for each column. I'm not 100% sure how to solve this (though it should definitely be solved before merging).

The issue is that since we aren't passing AbstractArray to rcopy as a defined type anymore, if there's only one element in a column it's interpreted as a single value, leading to the issue here.

asinghvi17 commented 1 month ago

Huh! OK, I didn't realize that other classes are able to override data.table even if it is first in the vector of classes. For my particular application, I can just implement a top-level conversion function, so that's my issue solved.

On a more general level, though, this does mean that anything within a data.table can't be copied by a custom function, unless I misunderstand? Consider a data table with 2 columns - one has real numbers, and one has a custom object defined in R. There seems to be no way (without this PR) to allow that custom object to be converted correctly within the data.table as it's turned into a Julia DataFrame?

palday commented 1 month ago

You can always define a custom rcopy method with the target datatype (as shown in my example) and call that explicitly. rcopy without the target datatype is just syntactic sugar that uses a registered default datatype.

mkitti commented 1 month ago

I see. I understand better now.

mkitti commented 1 month ago

Anyways, I want to see CI passing here. If you need to make the change to CI to make it pass, do it. We can then consider how much breakage is involved and consider if we need to jump major versions.

palday commented 1 month ago

Anyways, I want to see CI passing here. If you need to make the change to CI to make it pass, do it. We can then consider how much breakage is involved and consider if we need to jump major versions.

@mkitti

Please no, don't change the tests. The tests here caught the introduction of arguably incorrect behavior. The correct behavior is using the explicit conversion target type and defining an appropriate method.