DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

Update single poly/poly crosswalk to include intermediate intersection table, use that for xwalk #70

Closed jordansread closed 5 years ago

jordansread commented 5 years ago

Kelsey's code to be included int 1_crosswalk

vitensek commented 5 years ago

Just saw here that you may want the spatial intersection step to go in 1_crosswalk_fetch, but I've currently got the function in spatial_crosswalks.R and the target for the full intersection table in 2_crosswalk_munge. Let me know if you'd like it moved to 1_crosswalk_fetch.

I'm currently trying to build the target for the MNDOW-NHDHR intersection table, but it wants to rebuild 1_crosswalk_fetch/out/canonical_lakes_sf.rds first, which I wasn't expecting!

limnoliver commented 5 years ago

Hey Kelsey - you might want to abandon that rebuild. We recently identified this as an issue with the way scipiper is handling line endings. Alison made an update to scipiper, and if you install the latest version of the task_combiners branch, it should stop trying to rebuild 1_crosswalk_fetch/out/canonical_lakes_sf.rds.

so devtools::install_github('USGS-R/scipiper', ref = 'task_combiners') should do the trick!

vitensek commented 5 years ago

Hmm...that doesn't seem to be working for me. I tried doing a hard reset so that my forked repository is back in sync with the upstream repository and adding back in my changes, but it still starts the build for 1_crosswalk_fetch/out/canonical_lakes_sf.rds.ind where it left off (it was halfway done). So I must have made some changes that aren't being reset? Is it okay to just let it finish building, or do you have any other suggestions?

jordansread commented 5 years ago

You’ll need to do a devtools install of the task_combiners branch of scipiper for it to work. Did you do that before restarting R?

vitensek commented 5 years ago

Yep. I tried several iterations of restarting R, using the devtools::install_github('USGS-R/scipiper', ref = 'task_combiners') line Sam gave me, uninstalling scipiper, restarting, reinstalling, restarting, etc. Seems like it installed without any issues.

Is it strange that the build for canonical_lakes_sf.rds.ind would start halfway through (where it left off), even after doing a hard reset of my repository? That is, it starts here:

Starting loop attempt 1 of 30 with 4 of 8 tasks remaining:

Building Indiana [====================================>-------------------------------------] 50%

jordansread commented 5 years ago

oh whoops, I didn't see her detailed instructions (was replying from my phone).

Kelsey, I'd guess that maybe Alison's fix didn't cover your machine's rules on line endings (?) which would make remake think your task table yaml is different from the previous one that generated the results.

But that's just a guess.

In the meantime, can you create a PR with your code change suggestion to the crosswalk function? I can take a look at that and try it out on my machine. Also, probably no harm in having your computer run overnight to finish those jobs to see if the end result changes any of the build status files or .ind files.

Is it strange that the build for canonical_lakes_sf.rds.ind would start halfway through (where it left off), even after doing a hard reset of my repository? That is, it starts here:

I think I understand why this is happening. loop_tasks uses two build checks - the first is a "light" check that skips over all targets in the task file that already have files and starts building the ones that do not (regardless of whether the targets with files are out of date). After this is complete, it then checks whether all files are up to date, which can sometimes trigger a rebuild of the targets that had files and were skipped over the first time. This is the way it should work. My guess is if you look in your folder, you've got files for any of the states that were skipped.

jordansread commented 5 years ago

Kelsey, when you get a chance can you post what you get when you run

scipiper::get_remake_status('1_crosswalk_fetch_nhd_tasks.yml')
vitensek commented 5 years ago

Sure, here it is:

> scipiper::get_remake_status('1_crosswalk_fetch_nhd_tasks.yml')

Warning in .remake_add_targets_implied(obj) :
  Creating implicit target for nonexistant files
 - 4_params_munge/out/lakeattributes_canopyheight.rds.ind: (in 4_params_munge)
 - 4_params_munge/out/lakeattributes_location.rds.ind: (in 4_params_munge)

                           target is_current dirty dirty_by_descent                    time
1            nhd_HR_download_plan       TRUE FALSE            FALSE 2019-08-23 19:55:55 UTC
2 1_crosswalk_fetch_nhd_tasks.yml       TRUE FALSE            FALSE 2019-08-23 19:55:55 UTC

                              hash                            fixed
1 4ab5854f1e428478e3a0c79871287117 2ccb6bd3c3fcf370d013713443d7a913
2 9cc6f078ec9749daff6eff6bb45add9e 1351c745938be0580e92016cec901cb0
jordansread commented 5 years ago

Ok, the hash I see for the ..tasks.yml (9cc6f078ec9749daff6eff6bb45add9e) is the same one you'd get with the different line endings. My guess is that there is an RStudio or git setting that is switching the line endings automatically. Here was Alison's diagnosis when she was digging into this issue earlier (note that she has already updated the cat() issue which was changed in that newer version of scipiper that would be installed w/ devtools):

OK, I think I found the issue and part of a fix. To summarize my line ending knowledge to date, there are three possible sources of line ending issues with shared caches:

  • One - the culprit here - is that scipiper was using cat to write that .yml file, and cat appears to write system-specific line endings without so much as giving you an option. I've switched that function to readr::write_lines locally, to match the changes I made in other file-writing functions a year ago, and will get a PR going with that change shortly.
  • Another is the .gitattributes file, which I believe we've set up properly for this project (. text eol=lf): https://github.com/USGS-R/lake-temperature-model-prep/blob/master/.gitattributes.
  • The last is the .Rproj file (probably lake-temperature-model-prep.Rproj in this case). We seem not to be sharing this one in the project, but there's an option in there to set Line ending conversion: Posix (LF). So either we should all make sure we have it set to that setting or we should git commit a single shared version of the .Rproj file to enforce that shared setting for all of us.

☝️ see the last bullet and if you don't mind, can you check and (if needed) change that setting in your .Rproj file?

aappling-usgs commented 5 years ago

Hi Kelsey, have you already tried rebuilding the yml file with force=TRUE? That might help.

scmake('1_crosswalk_fetch_nhd_tasks.yml', force=TRUE')
vitensek commented 5 years ago

I think that fixed things! Hooray! Thanks, Alison.

I also checked the .Rproj file setting and the line ending conversion is set to Posix (LF).

Thanks for the help, everyone!

vitensek commented 5 years ago

I see now that *_tasks.yml files are in .gitignore. Is this why it wasn't working when I tried doing a hard reset of my repository? Thanks for your patience as I learn!

aappling-usgs commented 5 years ago

Yep, that's why. Good observation. We ignore *_tasks.yml because they can be big enough to be unwieldy with git and GitHub, and because they can be generated from files that we do commit. It can be annoying at moments like this, though.