Allow custom query strings for gitlab config

fmauch commented 2 years ago

Ever since I started using bugwarrior, using it for our gitlab instance felt suboptimal. Especially since I started using github more and more I discovered the custom search strings for the gitlab service.

This PR aims at providing similar functionality for the Gitlab service.

To motivate my use case a bit more: I wanted to have all issues that are assigned to me on our gitlab instance in my bugwarrior syncs. Unfortunately I can see quite a lot of projects inside that gitlab instance and since filtering by assignee happens after every issue from every project is pulled, this took more than half an hour for me.

This is somehow related to #593 but takes a different approach at a similar problem.

While implementing the tests I noticed that the tasks created for ToDos are quite different in some places. I separated the fields that were surprising to me in 5ed972c. Since I copied the ToDo extraction from the existing implementation and adapted it only a little, I assume this is not related to something I implemented. I noticed that yielding of merge_request and todo items is currently not covered by the tests at all, so this might be something worth fixing at another point. I also might be completely wrong, though.

Being a maintainer of one or two projects myself I know that larger refactory suggestions from external users can feel intrusive. However, I hope I made my motivation clear and my contribution is more than just a code dump in "works for me" quality.

I am also aware, that there are things missing, such as config documentation for this new feature, hence the draft status.

Edit: Looking back at the todo implementation, it is quite obvious that many of the unexpected task fields are directly hard coded in the code I copied. So, in order to keep things from diverging further I think it would not make sense to change the implementation (at least not my implementation only). It might, however, make sense to call my new function for yielding todos from the issue method in case of keeping the code duplication. (Edit2: Did in 139abdd)

fmauch commented 2 years ago

I've addressed a couple of the issues you mentioned.

Completely getting rid of the split path isn't that easy in my opinion.

We would have to add a flag whether we want to include issues or not. Imagine I specify a query for MRs. Should I download issues in that case, as well? For ToDo's I have a separate flag, for issues I don't.
We would probably need to define queries for all three types of items. Otherwise we would end up with weird configs if we don't split the execution paths. What should I do if I define a query for issues and set include_todos to true? If I define a query for merge_requests, do I have to set filter_merge_requests to True?

If we had a config flag for each type of item (issue, mr, todo), we could do something like the following (semi-pseudo-code):

issues = {}
merge_requests = {}
todos = {}

if include_issues:
  if config.issue_query:
    issues = issues_from_query()
  else:
    issues = issues_from_projects_api()
  issues_filtered = filter(issues)
  yield from get_issue_obj(issues)

if include_merge_requests:
 if config.merge_request_query:
   merge_requests = merge_requests_from_query()
 else:
    merge_requests = merge_requests_from_projects_api()
  merge_requests_filtered = filter(merge_requests)
  yield from get_issue_obj(merge_requests)

if include_todos:
  if config.todo_query:
    todos = todos_from_query()
  else:
    todos = todos_from_projects_api()
  todos_filtered = filter(todos)
  for todo in todos_filtered:
    transformed = transform(todo)
    yield transformed

Without such a refactoring I don't really see a clean way how to implement this. Which is not nice, as I also see the point of leaving users with unclear configuration options as in "which option can I use in combination with query?".

fmauch commented 2 years ago

I've pushed another version implementing a draft on option 2 from your comment

I couple of thoughts on this draft version:

I had to move the update_repo_map call out if the query-string block, as it operates on the filtered list. This could probably be refactored to get it back included, though.
I added a include_issues config option as in my humble opinion it should be possible to have a job that only fetches the MRs I am assigned as a reviewer. For example, I like to add reviews I am assigned to with a higher priority as this is something other devs are waiting for.
With this we have include_issues, include_todos and filter_merge_requests as boolean flags. Obviously, this is confusing. However, deprecating filter_merge_requests in favor of include_merge_requests would probably be out of scope of this PR.
There is a lot of code duplication between issues and merge_requests. However, further refactoring would probably lead to less readable code at this point.
~~ToDos definitely need some more refactoring.~~ (done in 50a5b64)
I think the code structure is now quite understandable and the configuration is also rather logical. Each component can be switched on and off and their default behavior can be overwritten.
~~There is (at least) one unlogical piece of configuration: Configuring a custom query string will ignore the the include_repos and such.~~ (fixed in 8394db1)

fmauch commented 2 years ago

@ryneeverett I think I am happy with the current implementation. If you don't have any larger complaints I will go ahead adding unit tests and updating the documentation next.

fmauch commented 2 years ago

@ryneeverett should I cleanup the commit history before merging this? We've done quite a lot of back and forth in this PR. Of course, one can argue, that leaving it this way it reflects the discussion history in this PR. What's your preferred workflow here?

ryneeverett commented 2 years ago

The way it is now, I would probably squash the entire PR into one commit. I think that gives a more accurate picture of the commit history in the context of the project as a whole because most of these commits only modify code that's already been added or modified by this PR. However, if you wanted to rewrite the history into a few commits that stand on their own (i.e., bugwarrior should still work, the tests should still pass, etc.) I'd go with that. For example, I could see something like:

gitlab: Refactor ServiceClient out of IssueService.
gitlab: Implement repo_cache.
gitlab: Custom query strings.
gitlab: Add include_issues.

fmauch commented 2 years ago

@ryneeverett done rewriting history.

GothenburgBitFactory / bugwarrior

Allow custom query strings for gitlab config #876