Wordseer / wordseer

The WordSeer text analysis tool, written in Flask.
http://wordseer.berkeley.edu/
40 stars 16 forks source link

More dependency inconsistencies #129

Closed keien closed 10 years ago

keien commented 10 years ago

More inconsistencies:

mysql> select * from dependency_xref_sentence where sentence_id = 4016;
+---------------+-------------+-------------+-----------+-----------+-------------+--------+--------+
| dependency_id | sentence_id | document_id | gov_index | dep_index | relation_id | gov_id | dep_id |
+---------------+-------------+-------------+-----------+-----------+-------------+--------+--------+
|       2126014 |        4016 |        4014 |         8 |         7 |           2 |    260 |      4 |
|      11248612 |        4016 |        4014 |         3 |         2 |           1 |   2486 |      2 |
|      81267819 |        4016 |        4014 |         1 |         5 |           8 |   2678 |      9 |
|    1124001168 |        4016 |        4014 |        12 |        11 |           1 |   2400 |    168 |
|    2215391260 |        4016 |        4014 |         6 |         8 |          22 |    539 |    260 |
|    8126012400 |        4016 |        4014 |         8 |        12 |           8 |    260 |   2400 |
|    9153912678 |        4016 |        4014 |         6 |         1 |           9 |    539 |   2678 |
|   23126012678 |        4016 |        4014 |         8 |         1 |          23 |    260 |   2678 |
|   27126014387 |        4016 |        4014 |         8 |         9 |          27 |    260 |   4387 |
+---------------+-------------+-------------+-----------+-----------+-------------+--------+--------+
9 rows in set (0.00 sec)

mysql> select word from word where id=260 or id=4
    -> ;
+------+
| word |
+------+
| to   |
| get  |
+------+
2 rows in set (0.01 sec)

mysql> select * from relationship where id=2;
+----+--------------+-------+
| id | relationship | count |
+----+--------------+-------+
|  2 | aux          | NULL  |
+----+--------------+-------+
1 row in set (0.00 sec)

mysql> select word from word where id=2486 or id=2;
+------+
| word |
+------+
| a    |
| plus |
+------+
2 rows in set (0.01 sec)

mysql> select * from relationship where id=1;
+----+--------------+-------+
| id | relationship | count |
+----+--------------+-------+
|  1 | det          | NULL  |
+----+--------------+-------+
1 row in set (0.00 sec)

mysql> select word from word where id=2678 or id=9;
+-------+
| word  |
+-------+
| i     |
| owner |
+-------+
2 rows in set (0.01 sec)
>>> for dep in s.dependencies: print dep
... 
<Dependency: nsubj(<Word: owner>, <Word: Car>) >
<Dependency: dep(<Word: plus>, <Word: a>) >
<Dependency: advmod(<Word: owner>, <Word: plus>) >
<Dependency: mark(<Word: like>, <Word: as>) >
<Dependency: nsubj(<Word: like>, <Word: I>) >
<Dependency: xsubj(<Word: get>, <Word: I>) >
<Dependency: advcl(<Word: owner>, <Word: like>) >
<Dependency: aux(<Word: get>, <Word: to>) >
<Dependency: xcomp(<Word: like>, <Word: get>) >
<Dependency: advmod(<Word: get>, <Word: away>) >
<Dependency: det(<Word: weekend>, <Word: the>) >
<Dependency: prep_for(<Word: get>, <Word: weekend>) >

These are both from the sentence, "Car owner a plus as I like to get away for the weekend." As you can see, out of the three dependencies I checked on MySQL, only the first one matched. The second one had the wrong grammatical relationship (det instead of dep), and the third one didn't even exist in our pipeline results. In addition, the count doesn't match: the SQL dump shows 9 dependencies, while we have 12.

abendebury commented 10 years ago

Well, I don't know what to tell you about the old code, but I can say that our output seems to be the same as the reference demo (use the "Pretty Print" option).

So, I guess we're actually more accurate than the original data.

keien commented 10 years ago

I guess it's fine then. I'll mark this as resolved.