Large archival/recovered tags difficult to work with

camrinbraun / tags2etuff

An R package for converting the hugely variable formats of animal tag data to a flat file format called eTUFF

Other

2 stars 2 forks source link

Large archival/recovered tags difficult to work with #14

Open camrinbraun opened 4 years ago

camrinbraun commented 4 years ago

For example, I have a swordfish tag that contains something like 3-4 million rows in the -Archive.csv from WC which, when converted to eTUFF ends up being closer to 10 million (3 vars x 3 million rows). Every time I try to manipulate the 10 or so archival tags I have that are near this size, R does a LOT of complaining and typically ends up crashing. This is even the case when I try to use AWS for which I have essentially unlimited computational capacity, memory, etc.

Workarounds?

camrinbraun commented 4 years ago

I’m looking at solutions along the lines of this or this. For me, reading isn’t really the issue as fread() is pretty great. The manipulation is the sticky part like when you have 10 million rows x 3 columns and want to then spread() eTUFF to a tidy-friendly format, for example. Seems like dtplyr, linked above, might be the key but should consider piping things together more as that also seems to be helpful.

galuardi commented 4 years ago

the tidyr::spread function is not actively developed anymore. Try switching to tidyr::pivot_wider(). I use it almost daily myself..

I may try this as I'm also running into this issue,