Add feedback submission to cmdline crash handler

mih commented 5 years ago

Both @yarikoptic and I had the impression that we do not get enough feedback on cases where datalad doesn't work as intended and crashes. One possibility to mitigate this situation would be to add an automated feedback submission ability to the cmdline interface's crash handler. It could format a document with:

sys.argv
traceback of the crash
wtf output

Given that such a report could contain sensitive information, we should allow users to scan/edit such a report before it is sent (unless configured to be OK).

Maybe there is already some tool that would facilitate the server-side of things.

yarikoptic commented 5 years ago

https://github.com/con/communitator was my quest for the tool which would take care about such specs. Helpme from Vanessa was the only one suggested

bpoldrack commented 5 years ago

Might be worth considering to also include (and allow to edit of course) all relevant config files. May be even some structural info on the dataset the command ran in (like super-/subdatasets).

vsoch commented 4 years ago

This would be very doable with helpme (sorry didn't see this when it was posted). For GitHub, it would require a user token, which is a likely barrier in that a user wouldn't readily want to generate and provide one.

Even if we set up a post to discourse (askci or neurostars or similar) with tag / topic of "datalad" we would still require the user to accept (on first issue) the OAuth screen.

Or a third option - a custom datalad submitter that sends data somewhere else?

yarikoptic commented 4 years ago

For GitHub, it would require a user token

oh... that reminded me about https://github.com/vsoch/helpme/issues/44 -- IMHO no token should be necessary, the full form submission I believe could be crafted via URL. Then any user could submit a PR

yarikoptic commented 4 years ago

Even if we set up a post to discourse (askci or neurostars or similar) with tag / topic of "datalad" we would still require the user to accept (on first issue) the OAuth screen.

yes, neurostars would be the one, and I think it is ok for a user to accept OAuth screen if really needed (i.e. couldn't be done similarly via passing the whole body via url)

vsoch commented 4 years ago

@yarikoptic that's definitely doable! The url would open and then (hopefully) the user would be logged in to submit.

Would it be okay with you to add a custom datalad helper for this purpose? The official github integration happens with the token (more progamatically) and I wouldn't want to fuss around with that.

Or are you more interested in submission to neurostars?

vsoch commented 4 years ago

Going for a quick run, back in a bit!

vsoch commented 4 years ago

Oh I just had a random cool idea (that will need a little more thinking through). Helpme is optimized so far for the user / support staff - the issue is posted to GitHub. But another alternative use case would be to build another layer that can process what helpme produces, and ultimately produce some kind of error metrics. For example, imagine if there was a repo "datalad-support" where the issues came in, but each time a new issue was posted we had a GitHub workflow to process the metadata and update some tiny (flat file based) database stored with the repository. Even if it's just a listing of the things listed above, it would be cool to see what kind of things we can learn from the data. If it works well or provides insights for datalad, I would bet other open source projects would be interested too! I could make a helpme client in other languages, if it were desired.

okay now run!

vsoch commented 4 years ago

Back! The high level goal is to provide automated bug reports, but also to turn those bug reports into actionable data.

yarikoptic commented 4 years ago

I think that the last idea is close to what I thought to "research" at some point -- an open alternative to https://sentry.io, the service to automatically report crashes etc to. Such service could inform us (even without "manual" bug reports) on what kind of problems users run into, and how common they are. Since I am afraid errors might be "too common" etc, I would not make it file a new issue for each occasion. There should be some "fingerprint" of a crash, and follow up on existing ones (providing more OS etc info) if fingerprint matches. So, might be quite tricky to implement (unless there is already some solution). But if it is github based ("datalad-support" repo you thought about) - it would not be available to all users, but only to those who have github token registered I guess. So it would be of a different use -- probably just to automagically upload gory details of an error to later link to in a "manual" bug report. But that again could be done as part of the initial idea, here -- and github would match by the title (at least) if another report like that was reported already, thus possibly helping to eliminate duplicates. Or have I misunderstood the idea?

vsoch commented 4 years ago

You know, sentry.io has a free tier - I use it on a lot of projects and it works great. It does require, however, a token :)

You hit the right thread of what I'm getting at, and let me provide some more detail to help.

A repository is created, datalad-support, that has github workflows with scripts to parse metadata about a bug report.
A command line tool (well, in the case of datalad it would be imported functions), helpme, is used to collect needed metrics about an error and submit an issue to this GitHub (and as you mentioned in the linked issue, we can try opening issues without needing a token, meaning that it would only work for a user on a desktop OR if we give them the link.) This would be different from sentry.io, because without a token, we have to ask the user to press the submit button. But if you would want them to preview it first anyway, perhaps the two are equivalent.
When the issue is opened, it triggers the workflow to generate a unique id based on some hash of metadata (perhaps the function that triggered / traceback / error message?) and then update the "database" of issues, primarily counts associated with metadata. But since we also have the calling issue, we could very easily store this record, and in the case that the hash is found (and associated with an issue) close the new issue and point to the previous.

Does that make sense? It's a totally free / open source and hacky way of getting the most simplest of functionality of sentry.io (a record of the issue) based with GitHub.

vsoch commented 4 years ago

@mih could you give me an example of how you'd retrieve the traceback, and output from this wtf tool?

mih commented 4 years ago

Hey @vsoch !

@mih could you give me an example of how you'd retrieve the traceback, and output from this wtf tool?

Would you need to hook into another tool from your end? I though the easiest way (and TBH only way I can imagine right now), would be to hook into datalad's last layer of the command line interface, where all exceptions bubble up to (around here). Given the exceptions themselves, any recoverable traceback should be accessible from that point, and could be fed into a reporting helper. In addition, wtf() utilized there to amend the report with system info.

Am I on the right path?

yarikoptic commented 4 years ago

output from this wtf tool? you can find its code under datalad/plugin/wtf.py and tests under datalad/plugin/tests/test_plugins.py . Simplest invocation:
python -c 'from datalad.api import wtf; res=wtf(); print(">%s<" % wtf())'
would show that we return the structure, rendered version is printed to the screen... I guess you would need to either use swallow_outputs (like tests do) or add argument to return (yield) rendered version instead of the structure

vsoch commented 4 years ago

Wow this is impressive! I'm guessing the wtf() function returns the content between the > < and the rest is just printed to the screen for the user.

# WTF ## configuration ## datalad - full_version: 0.10.3.1.dev2060-g78c29 - version: 0.10.3.1.dev2060 ## dataset - id: None - metadata: - path: /home/vanessa/Documents/Dropbox/Code/Python/datalad - repo: GitRepo ## dependencies - appdirs: 1.4.3 - boto: 2.49.0 - cmd:bundled-git: UNKNOWN - cmd:git: 2.21.0 - cmd:system-git: 2.21.0 - cmd:system-ssh: 7.6p1 - exifread: 2.1.2 - git: 3.0.2 - gitdb: 2.0.5 - humanize: 0.5.1 - iso8601: 0.1.12 - keyring: 18.0.0 - keyrings.alt: 3.4.0 - msgpack: 0.6.1 - mutagen: 1.43.0 - requests: 2.21.0 - tqdm: 4.31.1 - wrapt: 1.11.1 ## environment - LANG: en_US.UTF-8 - LANGUAGE: en_US - PATH: /home/vanessa/anaconda3/bin:/home/vanessa/anaconda3/condabin:/home/vanessa/google-cloud-sdk/bin:/home/vanessa/.rbenv/plugins/ruby-build/bin:/home/vanessa/.rbenv/shims:/home/vanessa/.rbenv/bin:/home/vanessa/.linuxbrew/bin:/home/vanessa/.linuxbrew/sbin:/opt/emsdk:/opt/emsdk/fastcomp/emscripten:/opt/emsdk/node/12.9.1_64bit/bin:/home/vanessa/.cargo/bin:/home/vanessa/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/home/vanessa/Documents/globusconnectpersonal-2.3.3:/usr/local/go/bin:/home/vanessa/Documents/Dropbox/Code/Google/go/bin ## extensions ## git-annex - message: CommandError: command '['git', 'annex', 'version']' failed with exitcode 1 Failed to run ['git', 'annex', 'version'] under None. Exit code=1. out= err=git: 'annex' is not a git command. See 'git --help'. [cmd.py:run:544] - version: not available ## location - path: /home/vanessa/Documents/Dropbox/Code/Python/datalad - type: dataset ## metadata_extractors - annex: - load_error: None - module: datalad.metadata.extractors.annex - version: None - audio: - load_error: None - module: datalad.metadata.extractors.audio - version: None - datacite: - load_error: None - module: datalad.metadata.extractors.datacite - version: None - datalad_core: - load_error: None - module: datalad.metadata.extractors.datalad_core - version: None - datalad_rfc822: - load_error: None - module: datalad.metadata.extractors.datalad_rfc822 - version: None - exif: - load_error: None - module: datalad.metadata.extractors.exif - version: None - frictionless_datapackage: - load_error: None - module: datalad.metadata.extractors.frictionless_datapackage - version: None - image: - load_error: None - module: datalad.metadata.extractors.image - version: None - xmp: - load_error: None - module: datalad.metadata.extractors.xmp - version: None ## python - implementation: CPython - version: 3.7.3 ## system - distribution: debian/buster/sid - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8 - max_path_length: 307 - name: Linux - release: 4.15.0-70-generic - type: posix - version: #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 # WTF ## configuration ## datalad - full_version: 0.10.3.1.dev2060-g78c29 - version: 0.10.3.1.dev2060 ## dataset - id: None - metadata: - path: /home/vanessa/Documents/Dropbox/Code/Python/datalad - repo: GitRepo ## dependencies - appdirs: 1.4.3 - boto: 2.49.0 - cmd:bundled-git: UNKNOWN - cmd:git: 2.21.0 - cmd:system-git: 2.21.0 - cmd:system-ssh: 7.6p1 - exifread: 2.1.2 - git: 3.0.2 - gitdb: 2.0.5 - humanize: 0.5.1 - iso8601: 0.1.12 - keyring: 18.0.0 - keyrings.alt: 3.4.0 - msgpack: 0.6.1 - mutagen: 1.43.0 - requests: 2.21.0 - tqdm: 4.31.1 - wrapt: 1.11.1 ## environment - LANG: en_US.UTF-8 - LANGUAGE: en_US - PATH: /home/vanessa/anaconda3/bin:/home/vanessa/anaconda3/condabin:/home/vanessa/google-cloud-sdk/bin:/home/vanessa/.rbenv/plugins/ruby-build/bin:/home/vanessa/.rbenv/shims:/home/vanessa/.rbenv/bin:/home/vanessa/.linuxbrew/bin:/home/vanessa/.linuxbrew/sbin:/opt/emsdk:/opt/emsdk/fastcomp/emscripten:/opt/emsdk/node/12.9.1_64bit/bin:/home/vanessa/.cargo/bin:/home/vanessa/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/home/vanessa/Documents/globusconnectpersonal-2.3.3:/usr/local/go/bin:/home/vanessa/Documents/Dropbox/Code/Google/go/bin ## extensions ## git-annex - message: CommandError: command '['git', 'annex', 'version']' failed with exitcode 1 Failed to run ['git', 'annex', 'version'] under None. Exit code=1. out= err=git: 'annex' is not a git command. See 'git --help'. [cmd.py:run:544] - version: not available ## location - path: /home/vanessa/Documents/Dropbox/Code/Python/datalad - type: dataset ## metadata_extractors - annex: - load_error: None - module: datalad.metadata.extractors.annex - version: None - audio: - load_error: None - module: datalad.metadata.extractors.audio - version: None - datacite: - load_error: None - module: datalad.metadata.extractors.datacite - version: None - datalad_core: - load_error: None - module: datalad.metadata.extractors.datalad_core - version: None - datalad_rfc822: - load_error: None - module: datalad.metadata.extractors.datalad_rfc822 - version: None - exif: - load_error: None - module: datalad.metadata.extractors.exif - version: None - frictionless_datapackage: - load_error: None - module: datalad.metadata.extractors.frictionless_datapackage - version: None - image: - load_error: None - module: datalad.metadata.extractors.image - version: None - xmp: - load_error: None - module: datalad.metadata.extractors.xmp - version: None ## python - implementation: CPython - version: 3.7.3 ## system - distribution: debian/buster/sid - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8 - max_path_length: 307 - name: Linux - release: 4.15.0-70-generic - type: posix - version: #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 >[{'action': 'wtf', 'path': '/home/vanessa/Documents/Dropbox/Code/Python/datalad', 'type': 'dataset', 'status': 'ok', 'decor': None, 'infos': OrderedDict([('configuration', ''), ('datalad', {'version': '0.10.3.1.dev2060', 'full_version': '0.10.3.1.dev2060-g78c29'}), ('dataset', {'path': '/home/vanessa/Documents/Dropbox/Code/Python/datalad', 'repo': 'GitRepo', 'id': None, 'metadata': ''}), ('dependencies', {'cmd:git': '2.21.0', 'tqdm': '4.31.1', 'cmd:bundled-git': 'UNKNOWN', 'cmd:system-git': '2.21.0', 'cmd:system-ssh': '7.6p1', 'appdirs': '1.4.3', 'boto': '2.49.0', 'exifread': '2.1.2', 'git': '3.0.2', 'gitdb': '2.0.5', 'humanize': '0.5.1', 'iso8601': '0.1.12', 'keyring': '18.0.0', 'keyrings.alt': '3.4.0', 'msgpack': '0.6.1', 'mutagen': '1.43.0', 'requests': '2.21.0', 'wrapt': '1.11.1'}), ('environment', OrderedDict([('LANG', 'en_US.UTF-8'), ('LANGUAGE', 'en_US'), ('PATH', '/home/vanessa/anaconda3/bin:/home/vanessa/anaconda3/condabin:/home/vanessa/google-cloud-sdk/bin:/home/vanessa/.rbenv/plugins/ruby-build/bin:/home/vanessa/.rbenv/shims:/home/vanessa/.rbenv/bin:/home/vanessa/.linuxbrew/bin:/home/vanessa/.linuxbrew/sbin:/opt/emsdk:/opt/emsdk/fastcomp/emscripten:/opt/emsdk/node/12.9.1_64bit/bin:/home/vanessa/.cargo/bin:/home/vanessa/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin:/home/vanessa/Documents/globusconnectpersonal-2.3.3:/usr/local/go/bin:/home/vanessa/Documents/Dropbox/Code/Google/go/bin')])), ('extensions', {}), ('git-annex', {'version': 'not available', 'message': "CommandError: command '['git', 'annex', 'version']' failed with exitcode 1\nFailed to run ['git', 'annex', 'version'] under None. Exit code=1. out= err=git: 'annex' is not a git command. See 'git --help'.\n [cmd.py:run:544]"}), ('location', {'path': '/home/vanessa/Documents/Dropbox/Code/Python/datalad', 'type': 'dataset'}), ('metadata_extractors', {'annex': {'module': 'datalad.metadata.extractors.annex', 'version': None, 'load_error': None}, 'audio': {'module': 'datalad.metadata.extractors.audio', 'version': None, 'load_error': None}, 'datacite': {'module': 'datalad.metadata.extractors.datacite', 'version': None, 'load_error': None}, 'datalad_core': {'module': 'datalad.metadata.extractors.datalad_core', 'version': None, 'load_error': None}, 'datalad_rfc822': {'module': 'datalad.metadata.extractors.datalad_rfc822', 'version': None, 'load_error': None}, 'exif': {'module': 'datalad.metadata.extractors.exif', 'version': None, 'load_error': None}, 'frictionless_datapackage': {'module': 'datalad.metadata.extractors.frictionless_datapackage', 'version': None, 'load_error': None}, 'image': {'module': 'datalad.metadata.extractors.image', 'version': None, 'load_error': None}, 'xmp': {'module': 'datalad.metadata.extractors.xmp', 'version': None, 'load_error': None}}), ('python', {'version': '3.7.3', 'implementation': 'CPython'}), ('system', {'type': 'posix', 'name': 'Linux', 'release': '4.15.0-70-generic', 'version': '#79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019', 'distribution': 'debian/buster/sid', 'max_path_length': 307, 'encoding': OrderedDict([('default', 'utf-8'), ('filesystem', 'utf-8'), ('locale.prefered', 'UTF-8')])})])}]<

I'll give a shot at a headless call to a Helper that will take extra input (the output above) and try to create an issue. Given the size of the content, I am thinking this will be hard to do without a proper GitHub token, but it's worth a shot.

vsoch commented 4 years ago

@yarikoptic so you would want to keep the user checking / validating what is being sent first (environment for example?) Right now there is a prompt built in to check.

vsoch commented 4 years ago

okay I'm reading for posting and testing further, can we create the repo datalad-support here with giving me permission to write?

vsoch commented 4 years ago

And here is an example issue - the details are generated by wtf (datalad) and the following sections are provided by helpme. https://github.com/vsoch/askci/issues/35. I'm writing up some quick docs to show you now for how that was generated.

vsoch commented 4 years ago

Here is (non rendered) docs for how it works https://github.com/vsoch/helpme/pull/49/files#diff-4dd658e1156e1809a673713a07f7e534R80-R113 and TBA rendered

yarikoptic commented 4 years ago

the wtf() function returns the content between the > < and the rest is just printed to the screen for the user.

somewhat -- between >< I just printed what is actually returned (a dict structure which is not rendered, thus not really for inclusion as is into an issue intended for human consumption). That is why I thought that either we RF to provide a stream to .write to, or just use swallow_output to capture it.

@yarikoptic so you would want to keep the user checking / validating what is being sent first (environment for example?) Right now there is a prompt built in to check.

if it would be just an invocation of "new issue" via url to github, user would get all that information present for review / editing in the browser, before they hit "Submit new issue", even with the "Preview" tab - so we would not need any editing/visualization on our end ;)

can we create the repo datalad-support here with giving me permission to write?

I created https://github.com/datalad/datalad-helpme and invited you. Decided to not go with datalad-support because we have datalad.support module and thinking about extracting some "setup-support" functionality into a common reusable package, so it might be a bit confusing. We could rename to any other later on if desired

vsoch commented 4 years ago

@yarikoptic I have only been testing using a token - the substantial size of the body has me guessing the un-authenticated version won't work - I'll mess around with it now to see how to encode the content for the url.

yarikoptic commented 4 years ago

... details are generated by wtf (datalad) and the following sections are provided by helpme. vsoch/askci#35.

Cool! that wasn't via url + browser, but via token, right?

As I have mentioned above, I think ideally that markdown rendering we have would be better for human consumption than the structure, here is an example:

DataLad 0.11.8.dev31 WTF (configuration, datalad, dataset, dependencies, environment, extensions, git-annex, location, metadata_extractors, system)

# WTF ## configuration ## datalad - version: 0.11.8.dev31 - full_version: 0.11.8.dev31-ga53a8-dirty ## dataset - path: /home/yoh/proj/datalad/datalad - repo: GitRepo - metadata: ## dependencies - tqdm: 4.30.0 - cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1 - cmd:git: 2.20.1 - cmd:bundled-git: 2.20.1 - cmd:system-git: 2.24.0 - cmd:system-ssh: 8.1p1 - appdirs: 1.4.3 - boto: 2.49.0 - exifread: 2.1.2 - git: 3.0.4 - gitdb: 2.0.6 - humanize: 0.5.1 - iso8601: 0.1.11 - keyring: 18.0.1 - keyrings.alt: 3.1.1 - msgpack: 0.5.6 - mutagen: 1.40.0 - requests: 2.21.0 - six: 1.12.0 - wrapt: 1.11.2 ## environment - PATH: /home/yoh/proj/datalad/datalad/venvs/dev3/bin:/home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/usr/local/sbin - LANG: en_US.UTF-8 - GIT_PAGER: less --no-init --quit-if-one-screen - GIT_PYTHON_GIT_EXECUTABLE: /usr/lib/git-annex.linux/git ## extensions ## git-annex - version: 7.20190819+git2-g908476a9b-1~ndall+1 - build flags: - Assistant - Webapp - Pairing - S3 - WebDAV - Inotify - DBus - DesktopNotify - TorrentParser - MagicMime - Feeds - Testsuite - dependency versions: - aws-0.20 - bloomfilter-2.0.1.0 - cryptonite-0.25 - DAV-1.3.3 - feed-1.0.0.0 - ghc-8.4.4 - http-client-0.5.13.1 - persistent-sqlite-2.8.2 - torrent-10000.1.1 - uuid-1.3.13 - yesod-1.6.0 - key/value backends: - SHA256E - SHA256 - SHA512E - SHA512 - SHA224E - SHA224 - SHA384E - SHA384 - SHA3_256E - SHA3_256 - SHA3_512E - SHA3_512 - SHA3_224E - SHA3_224 - SHA3_384E - SHA3_384 - SKEIN256E - SKEIN256 - SKEIN512E - SKEIN512 - BLAKE2B256E - BLAKE2B256 - BLAKE2B512E - BLAKE2B512 - BLAKE2B160E - BLAKE2B160 - BLAKE2B224E - BLAKE2B224 - BLAKE2B384E - BLAKE2B384 - BLAKE2BP512E - BLAKE2BP512 - BLAKE2S256E - BLAKE2S256 - BLAKE2S160E - BLAKE2S160 - BLAKE2S224E - BLAKE2S224 - BLAKE2SP256E - BLAKE2SP256 - BLAKE2SP224E - BLAKE2SP224 - SHA1E - SHA1 - MD5E - MD5 - WORM - URL - remote types: - git - gcrypt - p2p - S3 - bup - directory - rsync - web - bittorrent - webdav - adb - tahoe - glacier - ddar - git-lfs - hook - external - operating system: linux x86_64 - supported repository versions: - 5 - 7 - upgrade supported from repository versions: - 0 - 1 - 2 - 3 - 4 - 5 - 6 ## location - path: /home/yoh/proj/datalad/datalad - type: dataset ## metadata_extractors - annex: - module: datalad.metadata.extractors.annex - version: None - load_error: None - audio: - module: datalad.metadata.extractors.audio - version: None - load_error: None - datacite: - module: datalad.metadata.extractors.datacite - version: None - load_error: None - datalad_core: - module: datalad.metadata.extractors.datalad_core - version: None - load_error: None - datalad_rfc822: - module: datalad.metadata.extractors.datalad_rfc822 - version: None - load_error: None - exif: - module: datalad.metadata.extractors.exif - version: None - load_error: None - frictionless_datapackage: - module: datalad.metadata.extractors.frictionless_datapackage - version: None - load_error: None - image: - module: datalad.metadata.extractors.image - version: None - load_error: None - xmp: - module: datalad.metadata.extractors.xmp - version: None - load_error: None ## system - type: posix - name: Linux - release: 5.4.0-trunk-amd64 - version: #1 SMP Debian 5.4-1~exp1 (2019-11-26) - distribution: debian/bullseye/sid - max_path_length: 286 - encoding: - default: utf-8 - filesystem: utf-8 - locale.prefered: UTF-8

vsoch commented 4 years ago

@yarikoptic how do I capture the markdown rendering? It's just printed to the screen.

vsoch commented 4 years ago

And @yarikoptic what data would you like used to generate a hash identifier for the issue?

vsoch commented 4 years ago

holy crap the non token way worked! Super cool :)

Needed

import urllib
body = urllib.parse.quote(body)

vsoch commented 4 years ago

@yarikoptic so the one more piece of feedback I need is what content would you like to use for the hash? Once I know that I'll add it to the PR, and then test a GitHub workflow to handle a new issue. After that we can try testing from within datalad (the code in the README for datalad-helpme would basically be put somewhere in datalad).

yarikoptic commented 4 years ago

@yarikoptic how do I capture the markdown rendering? It's just printed to the screen.

resolved per https://github.com/vsoch/helpme/pull/49/files#r358953557

... what content would you like to use for the hash?

I think the traceback would be the most reliable way. Ideally I think it should

strip away installation specific portions of the path (so go through the traceback structure and "handle" each location/module to get path from the top of the package)
may be allowing for some "fuzz" of line numbers... but that would complicate fingerprinting/comparison. may be could be done in two stages (see below) or better way?

E.g. for a random open issue with "exception": https://github.com/datalad/datalad/issues/2855 which lists following traceback:

Traceback (most recent call last):
  File "/anaconda3/bin/datalad", line 8, in <module>
    main()
  File "/anaconda3/lib/python3.6/site-packages/datalad/cmdline/main.py", line 495, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/anaconda3/lib/python3.6/site-packages/datalad/interface/base.py", line 628, in call_from_parser
    ret = list(ret)
  File "/anaconda3/lib/python3.6/site-packages/datalad/interface/utils.py", line 422, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/anaconda3/lib/python3.6/site-packages/datalad/interface/utils.py", line 491, in _process_results
    for res in results:
  File "/anaconda3/lib/python3.6/site-packages/datalad/distribution/publish.py", line 824, in __call__
    **res_kwargs):
  File "/anaconda3/lib/python3.6/site-packages/datalad/distribution/publish.py", line 313, in _publish_dataset
    diff = True if force else has_diff(ds, refspec, remote, paths)
  File "/anaconda3/lib/python3.6/site-packages/datalad/distribution/publish.py", line 93, in has_diff
    remote_ref = '/'.join((remote, remote_branch_name))
TypeError: sequence item 1: expected str instance, NoneType found

we could get as a fingerprint (well -- catalogued under a checksum of it's serialization into json, str wouldn't be good enough due to all ' vs " etc) following dictionary (ordered/sorted) with two keys:

{
'traceback': [
   ('datalad', 8, '<module>', 'main()'),
   ('datalad/cmdline/main.py", 495, 'main', 'ret = cmdlineargs.func(cmdlineargs)'),
   ...
   ('datalad/distribution/publish.py', 93, 'has_diff', "remote_ref = '/'.join((remote, remote_branch_name))")
],
'exception': ("TypeError", "sequence item 1: expected str instance, NoneType found")
}

where for exception it is a (exc.__class__.__name__, str(exc)). But line numbers would bring pain here since with minor change (elsewhere) they would shift, so may be we could instead omit them (would be less strict fingerprint), and store that one as the first tier and then the checksum of the one with line numbers as the 2nd tier (which would identify it exactly).

Makes sense? (there could be much better ways probably!)

PS oh -- an idea!!! it would be cool if we manage to annotate our "DB" of such records with version (git describe) where it was fixed, so whenever we check for them and find it marked as fixed -- we could report to the user smth like

We have identified this error as a possible duplicate of an issue #1234 (http://github.com/datalad/datalad/issues/1234) which was fixed in 0.11.2-22-ga53a87c30, so please upgrade (the most recent release is 0.11.7).
Do you still want to report a (n)ew bug report or (c)ontinue [n/c]?

(yet possibly to introduce use of etelemetry from @satra and team here) . The last dialogue should be similar for issues which are not yet fixed:

We have identified this error as a possible duplicate of an issue #1234 (http://github.com/datalad/datalad/issues/1234).
Do you still want to report a (n)ew bug report, (a)dd to existing, or (c)ontinue [n/a/c]?

vsoch commented 4 years ago

I’ll look at the traceback ASAP! I actually just sent you a Gitter message that suggested the same thing haha. To be clear, if we are not requiring a github token and using the API, we won’t be able to identify if the issue exists before hand. However with the GitHub workflow we should be able to immediately answer the issue with a similar message after it’s posted. I’m going to put together a very simple toy example to demonstrate what I have in mind this evening!

yarikoptic commented 4 years ago

holy crap the non token way worked! Super cool :)

AWESOME! With that in mind, I think we might better even not bother with a separate repo, but rather point to this issue tracker!

Actually, our use case is even better! We have a number of "extensions" (e.g. https://github.com/datalad/datalad-container, https://github.com/datalad/datalad-crawler etc). We should "register" them within "datalad helpme" support. Depending on the traceback , we might need to ask user which repository to file against -- "datalad", "datalad-container", ... (if anything from extensions is in the traceback -- take the "deepest" as the one to suggest by default)

yarikoptic commented 4 years ago

we won’t be able to identify if the issue exists before hand. However with the GitHub workflow we should be able to immediately answer the issue with a similar message after it’s posted.

nothing is impossible!! If github workflow catalogues all the issues (in datalad and its extensions) according to fingerprint within datalad-helpme (file tree, e.g. "issues/fingerprint-checksum.json", may be with some 1 level caching tear by using first two digits of the fingerprint), the helpme could check if such fingerprint is known already (quick non-auth query to github datalad-helpme tree of fingerprints) and get all information about it (which issue(s), fixed or not, etc). oh, this could be awesome! ;-)

yarikoptic commented 4 years ago

Gitter message that suggested the same thing haha.

oh sorry -- whenever browser dies, all the web based social media goes with it and revives one at a time ;)

vsoch commented 4 years ago

okey doke - the toy example (the issue submission bit) is underway! This script (using the helpme version 0.0.40 that is under pull request) will respond to an exception by opening up a browser window (no GitHub token required) and asking the user questions, and of course including metadata. A hash of some content (provided by the calling function, I chose exception metadata) is provided to give an identifier.

https://github.com/rseng/github-support

Next I'm going to add a workflow to that repo to respond to a new issue by getting the identifier, and either looking it up or saving the metadata. This part will be scoped beyond helpme, and can vary depending on how the implementer wants to roll it. What I'll likely do is just provide a lot of examples that folks can modify.

And I'll invite you guys to rseng! It's something I've had up my sleeve for a bit - I want to grow a small community of RSE developers that generally want to work together on projects. This small project that will be used for datalad, and provided generally as an example, is a perfect example.

vsoch commented 4 years ago

okay toy example is totally done!

https://github.com/rseng/github-support

You can follow the instructions there to install the branch, and then just run ./example.py for your browser to open and write an issue. Given not changing example.py, it will generate the same error hash, and within about 20 seconds comment on the issue that it's already open (with link), and then close it. Take a look at the README.md there with a bunch of questions / suggestions for how you would want to implement for datalad-helpme.

I'm waiting on a conda package approval and then likely I'll merge this particular PR so you can actually install from pip, and I'm going to do a small blog post / write up to share the general tool. And @yarikoptic once you've tried the toy example, let's continue discussion of how you want the workflow to look for datalad-helpme.

vsoch commented 4 years ago

@mih @yarikoptic could you point me to where in the datalad codebase you would want to catch some error and start the helpme flow? I saw there are exceptions in datalad/support/exceptions that mostly are based off of RuntimeError, but I think you had mentioned something in the cmdline module? I'm not familiar with it, so if you could walk me through an example use case (and then how it errors) that should be enough to hopefully get started.

yarikoptic commented 4 years ago

https://github.com/datalad/datalad/blob/master/datalad/cmdline/main.py#L560 AFAIK the "best" point to introduce that

vsoch commented 4 years ago

Okay we're close! @yarikoptic with the wtf output, it's too long to open an issue programmatically:

Is there a way to shorten it possibly, or just select a subset of attributes that are most valuable? Here is the current body:

## What is the problem?
<!-- Please write a few sentences about the issue-->
## What steps will reproduce the problem?
<!-- What triggered this error? -->
## Is there anything else that would be useful to know in this context?
<!-- Have you had any success using DataLad before? (to assess your expertise/prior luck.  We would welcome your testimonial additions to https://github.com/datalad/datalad/wiki/Testimonials as well)-->
<details><summary>DataLad 0.10.3.1.dev3382 WTF (configuration, datalad, dataset, dependencies, environment, extensions, git-annex, location, metadata_extractors, python, system)</summary>

# WTF
## configuration <SENSITIVE, report disabled by configuration>
## datalad 
  - full_version: 0.10.3.1.dev3382-ge7e4-dirty
  - version: 0.10.3.1.dev3382
## dataset 
  - id: None
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /home/vanessa/Desktop/Code/datalad
  - repo: GitRepo
## dependencies 
  - appdirs: 1.4.3
  - boto: 2.49.0
  - cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
  - cmd:bundled-git: 2.20.1
  - cmd:git: 2.20.1
  - cmd:system-git: 2.23.0
  - cmd:system-ssh: 7.6p1
  - git: 3.1.1
  - gitdb: 4.0.2
  - humanize: 2.4.0
  - iso8601: 0.1.12
  - keyring: 21.2.1
  - keyrings.alt: 3.4.0
  - msgpack: 1.0.0
  - requests: 2.23.0
  - tqdm: 4.46.0
  - wrapt: 1.12.1
## environment 
  - LANG: en_US.UTF-8
  - PATH: /home/vanessa/anaconda3/bin:/home/vanessa/anaconda3/condabin:/home/vanessa/.rbenv/plugins/ruby-build/bin:/home/vanessa/.rbenv/shims:/home/vanessa/.rbenv/bin:/home/vanessa/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin
## extensions 
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - S3
    - WebDAV
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
  - dependency versions: 
    - aws-0.20
    - bloomfilter-2.0.1.0
    - cryptonite-0.25
    - DAV-1.3.3
    - feed-1.0.0.0
    - ghc-8.4.4
    - http-client-0.5.13.1
    - persistent-sqlite-2.8.2
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.0
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - hook
    - external
  - supported repository versions: 
    - 5
    - 7
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
  - version: 7.20190819+git2-g908476a9b-1~ndall+1
## location 
  - path: /home/vanessa/Desktop/Code/datalad
  - type: dataset
## metadata_extractors 
  - annex: 
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio: 
    - load_error: No module named 'mutagen' [audio.py:<module>:17]
    - module: datalad.metadata.extractors.audio
  - datacite: 
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core: 
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822: 
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - exif: 
    - load_error: No module named 'exifread' [exif.py:<module>:16]
    - module: datalad.metadata.extractors.exif
  - frictionless_datapackage: 
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image: 
    - load_error: None
    - module: datalad.metadata.extractors.image
    - version: None
  - xmp: 
    - load_error: No module named 'libxmp' [xmp.py:<module>:20]
    - module: datalad.metadata.extractors.xmp
## python 
  - implementation: CPython
  - version: 3.7.4
## system 
  - distribution: debian/buster/sid
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 306
  - name: Linux
  - release: 5.3.0-51-generic
  - type: posix
  - version: #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020
</details>

I removed the section on key/value backends and it seemed to work:

https://github.com/datalad/datalad-helpme/issues/3

Can we tell wtf to not include that?

vsoch commented 4 years ago

Ahh looks like I can say which to include! http://docs.datalad.org/en/latest/generated/man/datalad-wtf.html#s-section-section-section

datalad / datalad

Add feedback submission to cmdline crash handler #3649