MetricsGrimoire / Bicho

Bicho is a command line based tool used to parse bug/issue tracking systems
http://metricsgrimoire.github.com/Bicho/
GNU General Public License v2.0
72 stars 32 forks source link

Error retrieving name of the author #45

Open canasdiaz opened 11 years ago

canasdiaz commented 11 years ago

I've seen this while downloading issues from http://issues.liferay.com/browse/IDE

DBG: [21/Feb/2013-19:15:47] Bug activity: http://issues.liferay.com/browse/IDE-54?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel Change author format not supported. Change lost! Change author format not supported. Change lost! Change author format not supported. Change lost! Change author format not supported. Change lost! Change author format not supported. Change lost!

brainwane commented 11 years ago

I can still reproduce this on git master. Now investigating. I've narrowed it down to line 375 of jira.py...

brainwane commented 11 years ago

So, the "for table in tables" for-loop on line 367 is checking whether the length of author_date_text is less than 3 (children) and sometimes sees 2 children rather than three, e.g.,

<div class="action-details" id="changehistorydetails_301409">
<a name="action_301409">
<span class="user-hover user-avatar" style="background-image:url(/secure/useravatar?size=small&amp;avatarId=10163);">peter.wang</span>
 made changes  - <span class="date" title="11/May/10 7:04 PM"><time datetime="2010-05-11T19:04-0700">11/May/10 7:04 PM</time></span>
</a></div>

In some other JIRA tickets from Liferay's bug tracker, we see different children, e.g.:

<div class="action-details" id="changehistorydetails_1590811">
        <a name="action_1590811" />
    <a class="user-hover user-avatar" rel="tao.tao" style="background-image:url(/secure/useravatar?size=small&amp;avatarId=10162);" id="changehistoryauthor_1590811" href="/secure/ViewProfile.jspa?name=tao.tao">Tao Tao</a>
 made changes  - <span class='date' title='Today 3:45 AM'><time datetime='2013-10-17T03:45-0700'>Today 3:45 AM</time></span>
        </div>

So, some samples have 1 a tag and 2 span tags; some samples have 2 a tags and 1 span tag. The former situation will cause a problem when it gets to

author_date_text.findAll('a')[1]

so I think that's why there's the "len < 3" rough check.

The quick fix to this ticket would be to do a better job of checking what tags are available and grabbing the author's name and date appropriately, given a particular set of a and span tags.

In the long run, I think the real answer is to use the JIRA API instead of screen-scraping, so I think we should leave this ticket open till we've refactored the JIRA backend to do so.

canasdiaz commented 11 years ago

Totally!

hjmacho commented 10 years ago

I've obtained the same error when I've tried to download issues from https://tracker.moodle.org/browse/MDL.

Other quick fix would be to use jira-python library (https://bitbucket.org/bspeakmon/jira-python).

I've reprogrammed the backend using that library and it seems that works fine with the trackers of Moodle and Liferay (there was a little problem with the name of the backend, the backend and the module had the same name, so I had to rename the .py).

If you want to take a look I've forked Bicho and I've uploaded the modified source code.