apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.28k stars 3.47k forks source link

MIGRATION: Handle issues exceeding size limits #14648

Open toddfarmer opened 1 year ago

toddfarmer commented 1 year ago

While doing a dry run to test migration tooling, certain issues trigger errors during the import process due to excessive size of contents:

error: 413 Client Error: Payload Too Large for url: https://api.github.com/repos/toddfarmer/import_dry_run_2/import/issues

An example seems to be this comment from ASF GitHub Bot, which is a massive diff thrown into a comment.

We'll need to assess how to handle this. Some options include:

  1. Arbitrarily truncate content exceeding thresholds during migration.
  2. Skip importing content beyond threshold and leave a linked message back to the original content.
  3. Identify and cleanse content before migration.
  4. Filter all ASF GitHub Bot comments before importing (hopefully no human would create comments so large).
assignUser commented 1 year ago

I'd drop the bot comments (the reason for their existence seems to no longer be the case, the diff is available via the PR) and fall back on truncating any other huge comments as a fallback.

jorisvandenbossche commented 1 year ago

Yes, regardless of the size issue, I also think we should not migrate those ASF GitHub Bot comments.

For more recent comments of this bot, they are also no longer considered as "comments", but as activity. See for example https://issues.apache.org/jira/browse/ARROW-18173, you only see those comments if you look at "All" activity, and not if only looking at "Comments" activity. (so those type of comments are maybe already ignored?)