MetricsGrimoire / Bicho

Bicho is a command line based tool used to parse bug/issue tracking systems
http://metricsgrimoire.github.com/Bicho/
GNU General Public License v2.0
71 stars 31 forks source link

Probs trying to analyze kernel bugzilla with XML enconding #18

Closed acs closed 10 years ago

acs commented 11 years ago

acs@macitong:~/devel/Bicho$ ./bicho -g --db-user-out=kernel --db-password-out=kernel --db-database-out=bichoKernel -d 1 --backend-user="xxx@xxx.com" --backend-password=xxxx -b bg -u https://bugzilla.kernel.org/buglist.cgi?product=

xml.sax._exceptions.SAXParseException: :88426:115: not well-formed (invalid token)

Opening the query for issues in Chrome for the problematic issue:

https://bugzilla.kernel.org/show_bug.cgi?id=45911&ctype=xml

the XML resulting is not well formed:

error on line 87 at column 116: PCDATA invalid Char value 27

We need to filter the XML read before trying to parse it!

acs commented 11 years ago

I have a patch based on:

http://stackoverflow.com/questions/8733233/filtering-out-certain-bytes-in-python

I hope tomorrow I have it clean enough to commit it upstream and make the backend more robust

brainwane commented 10 years ago

Alvaro, were you able to commit it?

acs commented 10 years ago

Yes, I think I already commit it. My fault not closing this issue!

Take a look to:

https://github.com/MetricsGrimoire/Bicho/blob/master/Bicho/backends/bg.py#L1252

Time to close this issue!

We should move this code to a more general place in Bicho to share with other backends.