Open vsoch opened 4 years ago
Md5 of full traceback might be too rigid. I would have made it a tripple hierarchy:
Such levels could allow for matching similar even if not identical issues on client side (eg full repo could be cloned and updated in the cache). GitHub action could be used to make records of new issues (which would already have that composite fingerprint in them already).. although there might be benefits from collecting additional traceback and wtf details for already existing issues, I am afraid it might be too much chatter if we are to monitor this collection of issues
Okay so just to make sure I have it right, you would do (these are just randomly derived values so we can see what it looks like)
So you are proposing it would look like:
RuntimeError-<md5-functions>-<md5-datalad>
and then store there detailed info per issue with full traceback etc which could differ (line numbers shift between changes, paths differ in messages etc). Additional matching could be done on that narrowed down set.
If the traceback is part of the md5, and it's included in the issue, we would definitely be storing it. For the line numbers, I think that's probably overkill for the points that you mentioned.
My 0.02 for the above - I think the specific dependency lists and functions list might be too detailed for grouping errors. If we have an exception name, and then md5 based on the traceback, I think that could be enough for a human to browse, and to match issues that belong together. On the other hand, you are thinking that you would want to search based on md5 of just a functions list, or just a hash of functions? I have mixed feelings about this, because I don't think I fully understand what a functions list is. My instinct is that we should start with a simpler (less detailed) approach and only dive into more detail if we find it doesn't work well (meaning that two exceptions are labeled as the same but are very different to resolve / address, or we need to search for something and find that we cannot).
okay actually I think I figured it out re: the lists:
In [70]: datald
Out[70]: ['datalad', 'datalad/cmdline/main.py']
In [71]: others
Out[71]:
['site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/async_helpers.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/embed.py',
'site-packages/IPython/terminal/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/async_helpers.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py',
'site-packages/IPython/core/interactiveshell.py']
okay just updated the script here to use the updated (more specific) hash.
This is a first shot at adding the parser as a GitHub action to, when an issue is submit:
This should serve two fold - to both help the user, and keep a little database of issues reported. I suspect we will want to get a base merged, and then tweak details once the datalad PR is merged and we can adjust.