Closed GoogleCodeExporter closed 9 years ago
Without a test to confirm your claim, this seems to be CANNOT-REPRODUCE. The
code below passes 100% of times. Do you have at least a sometimes reproducible
test? Did you catch the data when the CharMatcher failed and did you confirm
that the data are indeed all ASCII? How does CharMatcher behave when run on the
same data again and again?
import static org.junit.Assert.assertTrue;
import org.apache.commons.lang3.RandomStringUtils;
import org.junit.Test;
import com.google.common.base.CharMatcher;
public class CharMatcherFlakyTest {
private static final CharMatcher ASCII_MATCHER = CharMatcher.ASCII;
private static final int MAX_ITERATONS = 1000000; // a million
@Test
public void asciiMatcherProbabilisticTest() {
for (int i = 0; i < MAX_ITERATONS; i++) {
String randomAsciiString = RandomStringUtils.randomAscii(100);
assertTrue("An ASCII CharMatcher didn't match a random ASCII String \"" + randomAsciiString + "\"",
ASCII_MATCHER.matchesAllOf(randomAsciiString));
}
}
}
Original comment by JanecekP...@seznam.cz
on 10 Apr 2014 at 10:51
My test was on the same data again, and the string wasn't ascii, it's a string
in hebrew. the string comes from multiple servers and several databases, so
each can mess up the string, but it's all the same process so I don't know how
likley it is...
Original comment by eric.itz...@gmail.com
on 10 Apr 2014 at 11:17
Marking this as invalid. If you can come up with a reproducible test case,
please reopen.
Original comment by kak@google.com
on 10 Apr 2014 at 1:54
Interesting. I'll try to scrape some hebrew pages and try it out to find
something reproducible.
Original comment by JanecekP...@seznam.cz
on 10 Apr 2014 at 2:44
It's pretty easy to validate that CharMatcher.ASCII matches only ASCII
characters, since there's only 128 of them. The logic is incredibly simple too.
It's basically:
'\0' <= c && c <= '\u007F'
So there's really no way for that to sporadically claim that something that
isn't ASCII is. Seems pretty clear that you're occasionally getting data that
is in fact all ASCII. You might want to take a look at what the data is when
that happens. It could even be all ASCII because of something like bytes being
decoded in the wrong charset, leading to lots of invalid characters being
replaced by '?'.
Original comment by cgdecker@google.com
on 10 Apr 2014 at 3:44
Or an occasional empty string in the input...
Original comment by kevinb@google.com
on 10 Apr 2014 at 3:55
This issue has been migrated to GitHub.
It can be found at https://github.com/google/guava/issues/<id>
Original comment by cgdecker@google.com
on 1 Nov 2014 at 4:09
Original comment by cgdecker@google.com
on 3 Nov 2014 at 9:07
Original issue reported on code.google.com by
eric.itz...@gmail.com
on 10 Apr 2014 at 7:34