Closed lindsayplatt closed 2 years ago
Thanks for tagging me in this, @lindsayplatt. I learned a lot!
7a_temp_coop_munge/tmp/parser_files.yml + coop_parsers: again, this target is setup to always build. It does not have an indicator file associated, so there was nothing to commit but it did rebuild. coop_parsers depends on that .yml so it always rebuilds, too. I get nervous when things I don't expect start to rebuild, so I wanted to explicitly call these out in case someone (even future me) searches these targets looking for answers in this repo 😃
My understanding is that we want this target to rebuild every time because otherwise file changes in the parsers don't get tracked. Since it is a quick target to rebuild and if none of the file hashes change (or new/removed ones) it doesn't trigger downstream builds. This greedy build was added as part of the "brittle coop pipeline" fixes if I remember correctly.
When I went to look at where the 7b_temp_merge/out/temp_data_with_sources.feather file was used, I noticed that it was the only file passed into the function, merge_lake_data(), as the raw filepath and not the indicator file
I don't really know why this one would be a file and the others would be ind...I feel like we probably had a reason for doing that but don't have a memory of it. I also remember a few other instances of using the files as dependencies instead of the .ind, and we were using a function called "require_local" to handle this difference. I admit I never developed a good understanding of what patterns we were using that needed that and I think it has been phased out of this pipeline but perhaps this file is a remnant of that old pattern?
More for the parser files yaml. Yes, this was connected to an "always stale" pattern for the reason I described above.
I moved the "forever stale" target from https://github.com/USGS-R/lake-temperature-model-prep/pull/242 to 1_fetch so it could be used by other steps easily
☝️ from this PR
While review @padilla410's pull request #321, I bumped into two targets that started to rebuild unexpectedly. After investigating, I figured out that there were two small changes I could make to fix those. The issues leading up to those discoveries are described below.
Rebuild issue 1:
7_config_merge/out/nml_meteo_fl_values.rds.ind
started to rebuild ⛔This was weird to me because I was really not expecting that target to start rebuilding. I had first set
options(scipiper.dry_put = TRUE)
then ranscmake('8_viz')
to test that Julie's changes in #321 would work for me and this target started rebuilding. I stopped the build, then ranwhy_dirty('7_config_merge/out/nml_meteo_fl_values.rds.ind')
. Based on the output, something was funky with the targetNLDAS_grid
.When I opened
6_drivers_fetch.yml
to look atNLDAS_grid
, I saw this (you can tell something is up with the formatting because the comment is not styled as a comment like it should be).Comments in makefiles need to be above target definitions, they cannot be on the same line. I think this was causing the issue and making
NLDAS_grid
get a new hash. When I moved the comment above the target (change in this PR), and then usedwhy_dirty()
as I did previously, the target was no longer dirty and therefore wasn't going to rebuild. YAY! 🎉Rebuild issue 2:
8_viz/inout/lakes_summary.feather.ind
started to rebuild ⛔I got passed the first issue and then made sure to run
options(scipiper.dry_put = TRUE)
beforescmake('8_viz')
to continue testing Julie's PR changes. I got stuck on this next target and when I looked at thewhy_dirty()
for it, I got the following:When I went to look at where the
7b_temp_merge/out/temp_data_with_sources.feather
file was used, I noticed that it was the only file passed into the function,merge_lake_data()
, as the raw filepath and not the indicator file. I assumed that what was happening is my pipeline had no reason to pull down the most recent version of that file usingsc_retrieve()
, so it was different compared to what was used last time. After I ransc_retrieve("7b_temp_merge/out/temp_data_with_sources.feather.ind")
myself and then triedscmake('8_viz')
again, it no longer tries to rebuild this target:To avoid this in the future (and to match how all other data files are referenced in that function), I updated it to use the
.ind
file for the temperature data rather than the raw feather file. So,sc_retrieve()
is always called for it, which avoids this issue of the local data file being different than what the pipeline expects and causing a rebuild. YAY! 🎉Other build files that came along for the ride
Here are a few additional comments about what just rebuilt and is considered "changed" in this PR with an explanation (mostly so that when someone searches the repo, these will pop-up to provide answers!):
6_temp_coop_fetch/out/coop_all_files.rds.tind
+coop_wants
: the.tind
indicator file ending stands for "time indicator" and it is because we have that file set up to always rebuild. The file build.tind
ends up with the new hash and thefreshened
field, meaning the latest date that the file got rebuilt. Again, this builds every timescmake()
is called and that file is in the dependency tree of your target of interest.coop_wants
always rebuilds because it depends on that.tind
file which is always different thanks to it always rebuilding and getting the new timestamp.8_viz/out/lake_obs_counts.png
: this rebuilt somewhat unexpectedly. I'm not able to see any differences using swipe or onion skin in the file changes tab. Nothing seems to be different from what I can tell! Mysterious ... 🕵️♂️7a_temp_coop_munge/tmp/parser_files.yml
+coop_parsers
: again, this target is setup to always build. It does not have an indicator file associated, so there was nothing to commit but it did rebuild.coop_parsers
depends on that.yml
so it always rebuilds, too. I get nervous when things I don't expect start to rebuild, so I wanted to explicitly call these out in case someone (even future me) searches these targets looking for answers in this repo 😃