apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.6k stars 525 forks source link

Discussion for Config UI Improvements #1700

Closed yumengwang03 closed 2 years ago

yumengwang03 commented 2 years ago

Description

Yumeng thinks the current Config UI has a series of problems/potential improvements, so she'd like to share some thoughts here for discussion:

Objectives of improvements

  1. Create a more reasonable and precise task flow that makes sense for users
  2. Reduce users' effort to complete such tasks and make the flow more fluid
  3. Adopt a more sustainable design approach (re-using EE's design system)

How do we improve?

1. Reduce user flow friction Several text input fields (e.g. GitHub endpoint url) are redundant and should be eliminated for users. Some other text input fields require users to memorize/look up Regular Expression, Cron code, Board IDs, etc. to fill in; it will be more friendly to turn them into dropdown selectors.

2. Correct the imbalance between regular UI mode and Advanced JSON mode at creating pipelines The functions of the regular UI mode doesn't match those of the Advanced mode for which user types in JSON for configuration. For instance, the regular UI mode has missing configuration fields (e.g. Feishu).

3. Show most relevant details for users in more obvious places (e.g. progression/status of pipeline runs) Lots of status/progression related information is buried too deep. For instance, in the Blueprint list, even unfolding a Blueprint cannot reveal the tasks of that Blueprint.

4. Reconsider the order of configuration tasks for each data provider Based on different data providers, we should reconsider the order of configuration tasks. For an example for GitHub, we should let users select repos first before filling in PR and Issue Type options. We can draw a flow chart for each.

5. Disentangle convoluted concepts The concepts of Pipelines and Blueprints are presented in a convoluted way by the UI. We should eliminate the "All Pipeline Runs" page and unify the function of "creating a Pipeline with Blueprint" and "creating a Blueprint with a previously defined Pipeline task" into one place.

e2corporation commented 2 years ago

@Startrekzky @yumengwang03 Thank for the UI proposal/feedback. I feel there are many months of development history not taken into consideration, however, also our interface is a result of selective business requirements we choose to meet each sprint. So there are features and interactions that need completion as a result. I've noted some responses below to add further context.

  1. Reduce user flow friction Several text input fields (e.g. GitHub endpoint url) are redundant and should be eliminated for users. Some other text input fields require users to memorize/look up Regular Expression, Cron code, Board IDs, etc. to fill in; it will be more friendly to turn them into dropdown selectors.

Endpoint URL maybe standard for some providers like GitHub, for JIRA it would be custom based on how the instance is deployed. For GitHub we can simply prefill a default value of the current known REST Endpoint for JIRA. I agree there are many visual improvements as well as several ways to design a GUI for Crontab configuration that can be done to make visual scheduling easier for less advanced users. However for advanced users the current interface would still be very practical.

  1. Correct the imbalance between regular UI mode and Advanced JSON mode at creating pipelines The functions of the regular UI mode doesn't match those of the Advanced mode for which user types in JSON for configuration. For instance, the regular UI mode has missing configuration fields (e.g. Feishu).

There are differences between Advanced and Standard (Visual Mode) intentionally, for instance multi-stage was intentionally left out of standard mode due to engineering decision -- there were plans to add stage support to the main interface. The pipeline provider options were mainly driven by our Data Integrations, as well as some additional plugins were created that were to be "Pipeline-only" plugins (GitExtractor, Refdiff etc). Advanced Mode was created to allow expert features we didn't want to yet incorporate in the visual interface. That being said, Feishu and other Plugins can be added to the main interface as needed.

  1. Show most relevant details for users in more obvious places (e.g. progression/status of pipeline runs) Lots of status/progression related information is buried too deep. For instance, in the Blueprint list, even unfolding a Blueprint cannot reveal the tasks of that Blueprint.

The Tasks is one sub-component that needs to be added in Blueprint details view once expanded, however due to the limited features we released for the first version of Blueprints this was not yet added.

  1. Reconsider the order of configuration tasks for each data provider Based on different data providers, we should reconsider the order of configuration tasks. For an example for GitHub, we should let users select repos first before filling in PR and Issue Type options. We can draw a flow chart for each.

Field/Input order can be customized as needed for each Data Provider. Mockups should be made for the preferred presentation of the Provider parameters.

  1. Disentangle convoluted concepts The concepts of Pipelines and Blueprints are presented in a convoluted way by the UI. We should eliminate the "All Pipeline Runs" page and unify the function of "creating a Pipeline with Blueprint" and "creating a Blueprint with a previously defined Pipeline task" into one place.

I don't agree that they are convoluted, Blueprints reflect a Recurring Plan/Configuration whereas Pipelines represents the historical runs/executions of that blueprint. Users should still be able to execute a pipeline without the overhead of creating a recurring data collection plan. With this proposal to fully merge to the 2 concepts, this would mean that a user is forced to create a Blueprint before being able to run the pipeline, which would mean the user is unable to run an on-demand pipeline. We are already showing related pipelines with the Blueprints, once tasks are displayed with the blueprint and interface changes are made the current approach would make sense.

klesh commented 2 years ago

offer available sub tasks for user to select what to run

e2corporation commented 2 years ago

offer available sub tasks for user to select what to run

@klesh Yes this can be done, it was requested since this ticket 924 https://github.com/merico-dev/lake/issues/924, however never prioritized during sprint planning. The Backend would also need to provide a new set of API endpoints [GET] /plugins/task-options/github for example, that can provide a list of available tasks for each plugin/provider, and also which tasks if any should be enabled by default.

warren830 commented 2 years ago

can we show button for load config in advanced mode instead of clicking date

e2corporation commented 2 years ago

can we show button for load config in advanced mode instead of clicking date

@warren830 We do have 2 access points already for loading configurations. The Pipeline Name Menu and the Settings Gear on the Tasks Editor Panel.

Screen Shot 2022-04-21 at 11 55 05 AM Screen Shot 2022-04-21 at 11 54 52 AM
klesh commented 2 years ago

to create issues for this epic

Startrekzky commented 2 years ago

@yumengwang03 @e2corporation @klesh I got some feedback from end users and from our own experience of setting up demo instances. This feedback is not just for config-ui, but also for the configuration feature itself.

  1. GitHub/GitLab users have to follow a long doc to collect full data from GitHub/GitLab. They have to create a pipeline for both Github and GitExtractor plugin, which is troublesome.

    • Impact: This affects ALL GitHub/GitLab users, such as Clickhouse, Coder, Wechaty, PingCAP, etc.
    • Solution: a good ux might be, once a user has selected GitHub to be the data source and chose the repos to be collected, 2 pipelines to collect commits/refs from GitExtractor and issues/PRs from GitHub API are automatically created. GitExtractor doesn't have to be perceived by the end users in the pipeline page. Users care the data sources and scope more than the collection methods.
    • Priority: HIGH
  2. GitHub users have to configure complicated RegEx to convert labels. This has two problems: a) For general users who only consume Github Basic Dashboard, they don't have to configure. Therefore, these users don't have to see the configuration in the connection page. b) For advanced users who consume 'release-based dashboard' or other dashboards that rely on label conversion and pr-issue mapping, it's hard for them to configure the right RegEx. Because unlike RegEx learning sites or Grafana variables that can show you the results immediately, config-UI cannot tell if user's RegEx is correct or how the labels will be converted. Users are very likely to populate the wrong RegEx and have to figure out what is the right RegEx, and then re-convert the data again.

    • Impact: This affects ALL GitHub users.
    • Solution: a good ux might be, after users have connected to GitHub, they don't have to configure label conversion rules immediately. Instead, the transformation rules will be set when creating a pipeline. Note: this solution also impact the order of 'pipeline creation' and 'transformation rule' of Jira plugin. Overall, I prefer the 'transformation rule setting' to be a part of 'pipeline creation'.
    • Priority: HIGH image
  3. Users may collect useless data, which will affect data collection speed and the metrics in pre-build dashboards.

    • Impact: This problem exists for users who use more than 1 tool that collects the same entities. For instance, many users use GitLab and Jira, such as Fairmarkit and Merico. In this case, once there're issues in GitLab, they'll be stored in table.issues in the domain layer. If the user only wants Jira data, they have to manually delete GitLab issues and all related data in other tables.
    • Solution: allow users to select specific entities when creating a pipeline.
    • Priority: MEDIUM
  4. Configuration rules for GitHub plugin apply to all repos collected, Github maintainers cannot configure each specific repo.

    • Impact: This is a problem for Github maintainers who have more than 1 repo and the label format differs in each repo, such as Wechaty. This has already caused us a big problem when setting up demo for Wechaty, as we had to update the label configuration every time before converting data from a new Wechaty repo.
    • Solution: we might have to introduce a local setting feature to the Github plugin.
    • Priority: LOW. The reason is, compared to the previous problems, I think it's not very common. Many maintainers who have more than 1 repo, such as PingCAP, use the same label format across different repos. In this case, one global setting applying to all repos works.
yumengwang03 commented 2 years ago

@hezyin @Startrekzky @klesh and I had a discussion on the next-step improvements for Config UI, summarized in the following form. The solution and priority columns are open for discussion.

Issue Description Solution Priority
1. When creating a Blueprint, design a more user-friendly orchestration of data source configuration Users care about what data are collected, rather than how data are collected. The purposes of GitExtractor and RefDiff are not familiar to users. We can hide GitExtractor from the UI, but if a user has selected GitHub or GitLab as data sources, we automatically enable GitExtractor to collect data. I'm not sure about RefDiff; It seems like an additional function that can be enabled under GitHub/GitLab. High (because the current orchestration causes the most user confusion)
2. When creating a Blueprint, allow users to select data cope and data entities We want to adapt the task flow more closely to users' intention by allowing them to select which data entities they'd like to select for a particular data source, at the same time introducing our domain layer concept to them without causing too much learning cost. Two solutions for discussion:
  • Choose domain first and then choose corresponding data sources under that domain (@Startrekzky )
  • Choose data sources first and then check the data entities in the corresponding domains (@yumengwang03 )
Medium
3. Move transformation rules (the RegEx section) from Data Integration to Creating Blueprints and replace RegEx with a UI that gives feedback to users immeidately Three reasons:
  • transformation rules are more closely related to data scope and we should let users select data scope and then set the rules.
  • It's almost impossible for users to memorize RegEx syntax. They should be able to check if their inputs are correct with our UI.
  • We don't wan to waste users patience at data connection before jumping into the major task of creating a Blueprint.
  • Move the transformation rules into Creating Blueprints, putting it after selecting the data scope. High (because this is a major hurdle for advanced users)
    4. Redesign a more reasonable workflow of creating a Blueprint We want users to create a Blueprint, and then all runs will be automatically generated as records under that Blueprint. Thus, a Blueprint should: 1. contain the data scope, data entities and transformation rules of all applied data sources; 2. have a running frequency (manual or recurring).
  • Remove the All Pipeline Runs page; only use the Blueprints list as the central place to view Blueprints.
  • Add the task information and a list of all previous runs of an individual Blueprint
  • Add a section (that can be unfolded) to show the detailed progress of the currently running task for users.https://github.com/merico-dev/lake/issues/1423
  • High (this affects the entire perception of what a Blueprint is and all tasks associated with it )
    klesh commented 2 years ago

    @yumengwang03 For issue #1, the refdiff provides some difference calculation algos between two ref (tag/branch), like:

    1. Given 2 refs, 1 older, another newer, calculate how many commits had been committed since older to newer, so, user can see how many commits between 2 releases.
    2. Same as above, but for issues, that is to calcluate how many issues got closes between 2 releases.
    3. Same as above, but for cherry-pick...

    Each refdiff subtask takes 2 input, old_ref and new_ref, it then would calculate the differences and store the result into refs_issues_diffs ref_commits_diffs and refs_pr_cherrypicks accordingly.

    klesh commented 2 years ago

    @yumengwang03 @Startrekzky For issue #2, I would like there are somewhere in-between the process, we can pick specific subtasks that I wish to run.

    klesh commented 2 years ago

    will be addressed by #1862