Josh-Lee1 / eBird-Fire-Index

BEES3041 Big Data Project
0 stars 0 forks source link

Running first for loop #4

Closed Josh-Lee1 closed 4 years ago

Josh-Lee1 commented 4 years ago

Hi @wcornwell and @coreytcallaghan I just finished running the function we made. I got this error. It took about 2 hours to run on my machine.

although coordinates are longitude/latitude, st_within assumes that they are planar Error in gzfile(file, mode) : cannot open the connection In addition: Warning messages: 1: Missing column names filled in: 'X47' [47] 2: In gzfile(file, mode) :

coreytcallaghan commented 4 years ago

Did you make sure to make the filteredData folder match exactly? That would be my guess at that error, but not sure.

I'm trying to run it now on my work desktop, and it is taking a while! Which I'm a bit baffled by because I swear it wasn't taking this long last week.

I'll keep you posted.

coreytcallaghan commented 4 years ago

Sorry @Josh-Lee1 for the delay. The reason for your original error above is because 'filename' had a folder in there as well, and that folder wasn't found in the folder structure.

So, we need paste0("filteredData/", gsub("Rawdatatest/", "",filename), ".rds") for it to work.

coreytcallaghan commented 4 years ago

Not sure how far will got yesterday after I bailed out of zoom, but now realized that all the other files don't have column names! So I'm trying to implement a fix.

coreytcallaghan commented 4 years ago

Hi @Josh-Lee1 and @wcornwell. I think I just fixed the function so it should be working now, and much much quicker...

@Josh-Lee1, please, make sure you understand all the steps I implemented, which I summarize here.

1) First need to get the column names from the first file (baa), and use this in the function because all files besides the first one don't have column names in them. Not sure if @wcornwell knows a better solution, but I used an ifelse in the function to get around this. We can chat through this and make sure you understand what is going on. 2.) I tweaked the spatial join by going fire shapefile to points. This returns a single df/sf (which I drop geometry on) only including the points that are in the fire. I think this is what is happening!!! Please check this by testing it on a file or two and making plots, as before. We need to be sure the spatial join is performing as expected. 3.) Then, because 2 is different, need to rejoin this with the 'all points' to get a df that is points in and out of the fire shapefile.

It should be a lot quicker now, so I guess you should make sure this is working on the 'test' files, and then if so, you could probably let it run overnight and saveRDSs out.

One potential issue (which I don't have time to investigate further at the moment) is I'm not sure what will happen if a given file has zero points that are in the fire shapefile and thus a null file is returned. This could break the for loop, but I guess we'll deal with this if/when it happens.

Hope this makes sense!

Josh-Lee1 commented 4 years ago

Hi Corey, sorry about the delay. I've gone through and got my head around the changes. Thanks so much for the bug fixes. I'm having issues remaking the plots with the new filtered data. Not sure if it is the NAs or something to do with changing the nature of the sf join. Trying to figure that out.

coreytcallaghan commented 4 years ago

Sounds great @Josh-Lee1! Keep up the good work, and let me know if you have any questions.

Josh-Lee1 commented 4 years ago

Hey so I havent figured out how to colour code all the points because ggplot is confused about "True". But here is the map of points inside after I filtered them out. Looks good. Do we need more evidence that we are good to go or shall I just go for it all tonight?

image

Josh-Lee1 commented 4 years ago

Here this should be easier to see image

coreytcallaghan commented 4 years ago

Go for it I reckon.

On Mon, Jun 29, 2020, 3:25 PM Josh-Lee1 notifications@github.com wrote:

Hey so I havent figured out how to colour code all the points because ggplot is confused about "True". But here is the map of points inside after I filtered them out. Looks good. Do we need more evidence that we are good to go or shall I just go for it all tonight?

[image: image] https://user-images.githubusercontent.com/59299694/85975822-700d6c80-ba1c-11ea-8231-adf543c1f1d1.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Josh-Lee1/eBird-Fire-Index/issues/4#issuecomment-650912512, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWSEJQ6ZQTMANFADAQMG2DRZAQUDANCNFSM4OIEVWEQ .