ajdamico / asdfree

analyze survey data for free
http://asdfree.com/
GNU General Public License v3.0
612 stars 449 forks source link

for big/monetdblite designs #300

Closed ajdamico closed 6 years ago

ajdamico commented 7 years ago

suggestion: use compress = FALSE on writeRDS for survey designs if you are not doing so already cc @hannesmuehleisen

ghost commented 6 years ago

Hi @ajdamico! I notice that my files from pnad are much bigger now (40 times bigger, from 33.6MB to 1.42GB using PNAD 2015) and then I discovered this recent change (compress = FALSE). Using system.time(readRDS()) the time to load them are almost the same on my machine (26sec). Could you explain why did you change it? Regards!

ajdamico commented 6 years ago

speedwise, might not matter much for those. send a pull request if you like

ajdamico commented 6 years ago

but please confirm 33mb wasn't database-backed beforehand..thanks

ghost commented 6 years ago

Well, I could send the pull request but I'm stupid enough to not understand what a database-backed means, so I will leave the way it is and keep compressing locally. Thanks for your time anyway!

ajdamico commented 6 years ago

worth googling and learning. github lets you do it on the website by clicking the pencil

On Wed, Mar 7, 2018, 3:41 AM Hugo Homem notifications@github.com wrote:

Well, I could send the pull request but I'm stupid enough to not understand what a database-backed means, so I will leave the way it is and keep compressing locally. Thanks for your time anyway!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ajdamico/asdfree/issues/300#issuecomment-371012486, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO54_OJyt3Ghg3wUP8jcaCGwfAbxGKks5tb1bOgaJpZM4P-3XW .

ghost commented 6 years ago

I said I didn't understand what it means, not that I didn't try to. Thought it was the database with the survey design, but I'm not sure at this point and didn't want to take your time. :v:

ajdamico commented 6 years ago

https://help.github.com/articles/editing-files-in-another-user-s-repository/

On Wed, Mar 7, 2018 at 5:08 AM, Hugo Homem notifications@github.com wrote:

I said I didn't understand what it means, not that I didn't try to. Thought it was the database with the survey design, but I'm not sure at this point and didn't want to take your time. ✌️

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ajdamico/asdfree/issues/300#issuecomment-371088990, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO55JsXiEFIEeYldP-hACgxgmLIhP2ks5tb7GzgaJpZM4P-3XW .

fwuensche commented 6 years ago

but please confirm 33mb wasn't database-backed beforehand..thanks

I believe he didn't understand what database-backed means.

ajdamico commented 6 years ago

if it's the same save speed with and without compress=FALSE then it's still better leaving that parameter out of the code..

ghost commented 6 years ago

The saving speed is not the same, but I think compress = false is the best option to go with, specially for uploading. Also because a file with a smaller size is easier to store in a pen drive. For PNAD2015:

compress = saveRDS readRDS file size
false ~22sec ~26sec 1.45GB
true ~42sec ~25sec 33MB

@ajdamico Do you agree?