Closed bovee closed 2 years ago
I can reproduce this pretty reliably on my computer with e.g. r <- Reader("./dad1.uv"); r[r["time"] < 0,]
and then I see a row with negative times (and several rows with negative wavelengths; the Python client has none of these erroneous rows). In each case it appears that the value is copied from a nearby row or column (e.g. many of the bad wavelengths are just copies of the intensity from the row). I also don't see any of the new metadata even though I cleared the package out of ~/R/.../entab
and rebuilt after deleting the cached libentab.so
and target
(I do see it with the CLI and Python client).
I thought maybe this was because of some memory unsafety in the vec_to_list
/vec_to_dataframe
shims I had, but I refactored everything and removed all of the unsafe
code in e7b6647b4d8169bf784e91dcd59f09852fb705fb and it's still happening.
I think it could still be one of a couple causes:
Robj
for each value and it's possible that sometimes extendr is not reserving a new one, but reusing an old one because of thread safety or GC issues? Strictly speaking, it would probably be faster and better to coerce the values into the proper type in Rust using ToVectorValue
instead of having to call unlist
on the columns in R, but that's kind of complex to implement.This seems to be "fixed" in 80ad4ebfbee93b1a2a054dc4c188febc9751be3f where I rewrote the as.data.frame
function to create more specialized vectors instead of allocating Robj
s for everything (which is also much, much faster). That leads me to believe this is an issue with doing a lot of allocations at once in R/extendr.
It also means this could still occur with columns of Value::DateTime
s (or lists/records, but parsers should be outputting in an EAV data model so we shouldn't really have those?).
Since I've "fixed" this, I made a simplified test case and filed a bug upstream to track this: https://github.com/extendr/extendr/issues/397
Cool. I can confirm that this bug seems to be fixed on my computer too in the latest version
Looks like there's a fix upstream now so I'm closing this one out.
There's an intermittent issue when using the R bindings where fields will be out-of-order and parse speed will be abnormally slow. From #28:
I thought this might be an issue with R linking to an old version of the dynamic library, but it's also possible it's an issue with all the unsafe libR code or something else entirely.