[Feature] 1.8.0-requirements

Search before asking

[X] I had searched in the issues and found no similar feature requirement.

Problem Description

自开源 Apache Linkis 项目以来，我们一直积极参与项目建设与维护，也会不定期对 Linkis 进行优化以及一些新需求的开发。为了让更多的人参与到开源建设，避免开发重复功能，以及更好地与社区共享我们的成果。我们计划将微众内部每个版本新需求、设计文档开源出来。感兴趣的同学可以一起进行新需求的开发，也欢迎大家一起优化、完善 Apache Linkis 开源社区。

Since we opened up the Apache Linkis project, We has been actively participating in project construction and maintenance. We also optimizes Linkis from time to time and develops some new requirements. In order to allow more people to participate in open source construction, avoid developing duplicate functions, and better share our results with the community. We plan to open source the new requirements and design documents for each version of Weizhong. Interested students can work together to develop new requirements, and everyone is welcome to work together to optimize and improve the Apache Linkis open source community.

Description

2024年9月1.8.0版本设计文档：https://docs.qq.com/desktop/mydoc/folder/czTzKxYPVQbE

2024年9月1.8.0版本新需求列表：

Linkis-自定义日期变量HH:mm:ss支持按照任务提交时间进行处理需求背景: 在处理大批量业务任务时，业务执行任务请求可能在上游组件排队等待，任务到达linkis时的时间有延时，导致实际处理时间与业务期望时间不一致，希望优化linkis系统处理时间，与业务预期的保持一一致。需求描述：修改Linkis对于时间的处理逻辑，由之前的已run_date为基准，改为已任务到达linkis时间为基准。
代码检索优化需求背景: 当前代码检索模块底层由ES实现，在查询是通过分词进行搜索，搜索信息不全，在某些场景下无法满足用户准确搜索历史执行代码的需求。需求描述： a. 优化代码检索功能，解决历史代码搜索不全的问题； b. hadoop起trino引擎查询hive库，like模糊匹配查询，较为精准的匹配用户执行代码
Linkis版本更新后强制用户刷新浏览器需求背景: 用户使用Linkis管理台大多内嵌在DSS平台，每当Linkis管理台版本升级时DSS侧由于浏览器缓存原因无法拉取Linkis新版本代码，期望优化通过主动更新浏览器缓存的方式拉取最新linkis代码。需求描述：通过文件hash的方式检测服务端代码是否更新，检测到由代码更新时，主动弹框提醒用户更新浏览器缓存代码。
新增本地python包管理功能支持用户定义所需的python模块需求背景: 对于Python模块，服务器机器上预留了常用的python模块，但是一些不常用以及用户自定义的python模块，在主机上安装不方便，用户无法使用。希望能让用户自己管理需要使用的python模块。需求描述： a. linkis 管理台增加 python模块管理功能，用户可以管理所需的Python模块，涉及python引擎和spark引擎(python spark 脚本) b. 用户提交python或python spark脚本任务时，可以按需加载用户上传的python模块
Linkis 引擎复用功能优化当python版本修改时不复用引擎需求背景: 为了提高任务执行效率，linkis 提供了引擎复用功能，但是对于一些特殊场景，例如python任务，python spark任务，引擎复用时未考虑所使用的python版本，用户提交了不兼容的python代码时可能导致python任务执行失败。需求描述：优化引擎复用逻辑，对于python引擎和spark引擎(执行python spark任务)复用时考虑Python版本，如果引擎启动时python版本与用户执行代码所需的python版本不一致时不复用引擎。
引擎管理器支持展示fixedEngineConn的sessionId信息需求背景: 对于需要复用引擎上下文信息的任务，可以通过给引擎打fixedEngineConn标签的方式让用户指定执行任务的引擎。用户设定后无法在引擎管理器区分具体的引擎与设定的标签信息。可以优化引擎管理器展示，标识fixedEngineConn的引擎及标签内容。需求描述： a. 优化引擎列表获取接口，返回引擎携带的标签信息 b. 前端根据标签key，标识fixedEngineConn标签的引擎以及展示标签内容
StarRocks&Nebula引擎支持配置默认库需求背景: linkis集成了Starocks(jdbc引擎)和nebula(nebula引擎)数据库，但是在提交任务时需要用户使用 use xxx的方式指定使用的Catalogs或图空间，使用不太方便。需求描述： a. 优化jdbc引擎连接器以及starrocks数据源，可以在管理台配置默认Catalogs，用户提交代码时使用默认配置的Catalogs b. 优化nebula引擎连接器，可以在管理台参数管理配置默认图数据库，用户提交代码时使用默认配置的图数据库

September 2024 version 1.8.0 design document: https://docs.qq.com/desktop/mydoc/folder/czTzKxYPVQbE List of new requirements for version 1.8.0 in September 2024:

Linkis-Custom date variable HH:mm:ss supports processing based on task submission time Demand background: When processing a large number of business tasks, business execution task requests may be queued by upstream components, and the time when the task arrives at linkis is delayed, resulting in inconsistency between the actual processing time and the expected business time. We hope to optimize the processing time of the linkis system to keep it consistent with business expectations. Requirement description: Modify Linkis 'processing logic for time from the previous run_date to the time when the task arrived at Linkis.
Code retrieval optimization Demand background: The bottom layer of the current code retrieval module is implemented by ES. When the query is searched through word segmentation, the search information is incomplete, and in some scenarios, it cannot meet the user's needs for accurately searching for historical execution code. Requirement description: a. Optimize code retrieval functions to solve the problem of incomplete historical code searches; b. Hadoop started the trino engine to query the hive library, like fuzzy matching queries, which more accurately matches user execution code.
Force users to refresh their browser after Linkis version update Demand background: Most users use the Linkis Management Console embedded in the DSS platform. Whenever the version of the Linkis Management Console is upgraded, the DSS side cannot pull the new version of Linkis code due to browser caching. It is expected to optimize the ability to pull the latest Linkis code by proactively updating the browser cache. Requirement description: It detects whether the server code is updated by means of file hashing. When it is detected that it is updated by code, it actively pops up a box to remind the user to update the browser cache code.
Added local python package management feature to support user-defined required python modules Demand background: For Python modules, commonly used python modules are reserved on the server machine, but some unused and user-defined python modules are inconvenient to install on the host machine and cannot be used by users. I hope that users can manage the python modules they need to use themselves. Requirement description: a. Linkis Management Console adds python module management functions, allowing users to manage required Python modules, involving python engines and spark engines (python spark scripts) b. When a user submits a python or python spark script task, the python module uploaded by the user can be loaded on demand
Linkis engine reuse optimization When python version is modified, the engine is not reused Demand background: In order to improve the efficiency of task execution, linkis provides engine reuse functions. However, for some special scenarios, such as python tasks and python spark tasks, the engine reuse does not consider the Python version used, and users submit incompatible python code may cause python task execution to fail. Requirement description: Optimize the engine reuse logic. Consider the Python version when reusing the python engine and spark engine (performing python spark tasks). If the python version at engine startup is inconsistent with the Python version required by the user to execute code, the engine will not be reused.
Engine Manager supports displaying sessionId information of fixedEngineConn Demand background: For tasks that need to reuse engine context information, users can specify the engine to perform the task by labeling the engine fixedEngineConn. After user settings, it is impossible to distinguish the specific engine and the settings label information in the Engine Manager. You can optimize the engine manager display and identify the engine and label content of fixedEngineConn. Requirement description: a. Optimize the engine list to obtain the interface and return the label information carried by the engine b. The front end identifies the engine of the fixedEngineConn tag and displays the tag content based on the tag key
StarRocks & Nebula engine supports configuration of default libraries Demand background: Linkis integrates Starocks(jdbc engine) and nebula(nebula engine) databases, but when submitting tasks, users need to use use xxx to specify the Catalogs or graph space to use, which is not convenient to use. Requirement description: a. Optimize the jdbc engine connector and starrocks data source. Default Catalogs can be configured in the management console. Users can use the default configured Catalogs when submitting code b. Optimize the nebula engine connector to configure the default graph database in the management console parameter management, and use the default graph database when users submit code

Use case

No response

Solutions

No response

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

apache / linkis