Closed iscas-lee closed 5 years ago
does it work with a smaller subset? --
Sent from iPhone
Yes, this software is working fine for small size file. But, If the running time exceeds 24 hours, "Futures timed out after [24 hours]" will be displayed.
Sorry for the late reply. We are aware of the issue with large files. This is, however, a limitation of R itself. So, we need to re-code our script to circumvent the fact that R by default tries to load an entire file into the memory. Still, we do not have time immediately to fix this issue -- we're busy teaching and all right now -- nor we have a student working on it at this very moment. If you are in a hurry, I suggest you read this [1] and [2], which give you an idea of how to resolve the problem. The easiest is to use ff library if your dataframe contains heterogeneous data; if data are homogeneous (e.g., a number matrix), then also bigmemory library will do. The most general solutions instead are using Hadoop and map-reduce to parallelize your complex task in smaller, faster subtasks [2], or alternatively, leverage a database for storing and then querying data [3].
Should you decide to update the script yourself, a pull request would be very much appreciated! ;-)
HTH,
[1] https://rpubs.com/msundar/large_data_analysis [2] http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/ [3] https://www.datasciencecentral.com/profiles/blogs/postgresql-monetdb-and-too-big-for-memory-data-in-r-part-ii
See issue #7
I have a large file with almost 200k lines. When I run the Senti4SD it takes more than 24 hours and then it displays the error message "Futures timed out after [24 hours]". Could you please help me how to solve this problem.