Open kferris10 opened 9 years ago
I'm having what appears to be the exact same issue and read through both #27 above and the referenced issue #22. I tried gc() like you suggested in 22, but it doesn't work on my machine just as it doesn't work above. What is the solution? I can restart R, but usually have to restart my machine for everything to run in a reasonable amount of time.
Related: I was trying to scrape data for all games starting on 03/01/2010 through the present by grabbing only one month at a time. R crashed midway through the games on 5/16/2012, so I restart, load my packages and define my connection, then run:
update_db(mysqlconnection, end="2012-05-20")
This starts getting the games from 5/17/2012 through 5/20, which obviously misses the remaining 5/16 games I didn't get to. How can I get the rest of the 5/16 games now without duplicating what I already have for that day?
@colemanconley I've never had an issue with duplicating games when using update_db
. My strategy is to first scrape one year of data. Then I can just run update_db
one year at a time. Is that not working for you?
I have these memory issues on windows but not on mac. The only way I've found to free up the memory is to restart the R session. What I do is make a new SNOW cluster with one node to run the scrape method each time, which is the same as having a new r session each time.
some code I use
ll <- seq(as.Date(start_date), as.Date(end_date), "1 year")
ntasks <- length(ll)-1
for(i in 1:ntasks) {
print(ll[i])
print(ll[i+1])]
cl<-makeCluster(1, type="SOCK", outfile = "")
clusterEvalQ(cl, library(pitchRx))
clusterEvalQ(cl, library(DBI))
clusterEvalQ(cl, library(RSQLite))
clusterEvalQ(cl, library(dplyr))
clusterExport(cl, list = c("ll", "files", "dbpath"), envir=environment())
clusterCall(cl, function(i) {
db <- src_sqlite(dbpath, create = TRUE)
scrape(start = ll[i], end = ll[i+1], suffix = files, connect = db$con)
dbDisconnect(db$con)
},i)
stopCluster(cl)
]
}
I am running into some errors trying to scrape large amounts of PITCHf/x data on my Windows 7 computer. Here are some screenshots to illustrate
I run this code to scrape several months of PITCHf/x data
gc()
appears to have no effectP.S. Sorry if those numbers are impossible to see. Let me know if it would help to improve the quality of any of the screenshots.