Closed hadley closed 3 years ago
Thanks for reporting. I'll look at this next week.
This specific issue is fixed in the "development" branch. I'm going to wait and see if the other one can be reproduced + I just submitted to CRAN so I'm going to wait a bit to resubmit.
@brodieG I can confirm that the development branch indeed solves this problem.
I ran into this error message ("reached theoretically unreachable branch 2, contact maintainer."
) while writing a snapshot test for HTML
table output. But after updating to diffobj
from the development branch, I no longer get this error and the snapshot test works perfectly! 😄
Reprex:
a <- 1:20
b <- 10:500
diffobj::ses(a, b, max.diffs = 100)
#> Warning in diff_myers(args[["a"]], args[["b"]], max.diffs =
#> args[["max.diffs"]], : Exceeded `max.diffs`: 511 vs 100 allowed. Diff is
#> probably suboptimal.
#> [1] "1,20c1,491"
Created on 2021-02-08 by the reprex package (v1.0.0)
Thanks for confirming. At some point in the near future I'm going to sit down and stare really hard at the code to see if I can figure out how I can get the other "unreachable" branch to be reached, and (whether I figure it out or not), resubmit to CRAN.
Here's another set of test cases for you, based around comparing tibbles. I tried to reduce the reprex but I couldn't locate the problematic behavior beyond isolating the problematic columns.
I'm seeing these two errors
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Logic Error: Exceeded buffer for finding fake snake; contact maintainer.
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Internal Error: reached theoretically unreachable branch 2, contact maintainer.
and the second one goes away when using the development branch, while the other is reproducible in both.
Perfect, thanks. I'll look in the next couple of days.
Oh and one more data point for the fake snake error message. In my reprex, the error depends on max.diff
and appears only when that value is between 72 and 203.
The development branch now addresses @gadenbuie additional case. I want to look at this more closely as it's clear that the cases with low max.diffs
are not well covered, so I'm not going to submit to CRAN just yet. I've found a couple of other issues already.
Aside, while it's great that you all are testing these low max.diffs
cases, I'm wonder a little bit why. I imagine performance should be acceptable for the likely use case (e.g. waldo output in up to ~15ish failed tests). For example, on my very slow system I get:
> a <- sample(letters[1:5], 1000, replace=TRUE)
> b <- sample(letters[1:5], 1000, replace=TRUE)
> system.time(ses_dat(a, b))
user system elapsed
0.020 0.000 0.021
These could run with max.diffs=1e3
.
I'm setting the number low not because of performance, but because for most of the waldo cases, once you beyond a small number of diffs you're probably comparing two unrelated vectors, and it's simpler to just print them side by side, rather than displaying a complex diff (e.g. https://github.com/r-lib/waldo/issues/51). Is there a better approach?
That makes sense. Maybe the thing to do there is to compare the match percentage, and if it is low report the vectors side by side instead of the diff'ed version? Using Garrick's example, we have:
> a <- test$bill_length_mm
> b <- comparison$bill_length_mm
> table(diffobj::ses_dat(a, b)[['op']])
Match Insert Delete
25 115 89
This requires some additional post-processing in waldo, but should be better than having a half-baked diff from hitting max.diffs
. Whether or not that's worth the effort is another question.
Ah good idea, I've made a note to look into that next time I'm working on waldo.
Created on 2021-01-11 by the reprex package (v0.3.0.9001)
(Originally reported in https://github.com/r-lib/waldo/issues/62)