Closed camilavargasp closed 17 hours ago
I met with Sam today and he showed me where deltafish
stopped working. He suspects that the left_join
call to join two arrow datasets together is causing his R session to crash. He left-joined a dataset with 112 rows with a 60 million rows dataset. The result after running collect
should be a simple 112 row dataset yet his R session crashes with the message, "terminate called after throwing an instance of..."
This may be indicative of a larger issue with arrow
itself, and not necessarily have something to do with the updated Delta fish data (which updated from 40 million to 60 million rows).
I felt like this issue is beyond my abilities so I need to consult with Jeanette on some options moving forward (opening a GitHub issue on the arrow
repo? converting to duckdb?...) I told Sam that he could open a Slack group chat with Jeanette and I, just so Jeanette knows what's going on but I will be the one carrying out the debugging.
Yesterday, Sam opened a Slack group chat with me and Jeanette, and he explained his problem there. Jeanette suggested a workaround where the collect
call is ran first, and then join
. That seemed to work fine for him so I think he'll stick to collecting before joining for now in his script. I think he was a bit disappointed that arrow
couldn't deal with large joins of uncollected data but there's not too much we can do in that area.
I'll keep this issue up for a bit before closing in case Sam had additional follow-up questions on Slack.
Sam Bashevkin approached Delta Stewardship Council with the following request:
Hi Maggie,
If you remember, as part of our NCEAS work group, Jeanette from NCEAS developed the
deltafish
R package (https://github.com/Delta-Stewardship-Council/deltafish) that provides access to the large integrated database of fish monitoring data. I’ve been working with collaborators at CDFW to add more data to the package, particularly the Salvage dataset, which has unfortunately made the dataset so large that the package is no longer working. Jeanette would be the best person to fix it, so I was wondering if there is any chance you have funding in your NCEAS contract for some follow-up on the past workshop so that Jeanette would be able to take a look at the package and get it working again. I’d be happy to chat about this if it would be easier.I hope all is well at DSP!
Best,
Sam