abenamer-zz / php-text-statistics

Automatically exported from code.google.com/p/php-text-statistics
0 stars 0 forks source link

word_count() is not accurate when counting sentences with quotes #6

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Here's the test case:

---
public function testWordCountWithQuotes() {
    $textStats = new TextStatistics();
    $text = "\"There should be seven words,\" said Joe";

    $expected = 7;
    $actual = $textStats->word_count($text); // value is 8

    $this->assertEqual($actual, $expected);
}

---

Here's a possible fix:

In the clean_text(), replace:

---
$strText = preg_replace('/[,:;()-]/', ' ', $strText); // Replace commans,
hyphens etc (count them as spaces)

---
with:

---
$strText = preg_replace('/[",:;()-]/', ' ', $strText); // Replace double
quotes, commans, hyphens etc (count them as spaces)

---

Original issue reported on code.google.com by james...@gmail.com on 2 Jun 2009 at 2:28