Closed kaori-seasons closed 3 months ago
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
#troubleshooting
Welcome more friends to supplement the scene, discuss here. @SbloodyS @xtr1993 Could you please discuss it together?
I have implemented the function of using git to manage code in my project,Here is my business flowchart, hope it helps you:
@xtr1993 Thank you very much, will do some research in the near future
After preliminary research,, I found that JGit, as a Git client, has a heavy logic and it is not very friendly to manually build a repository locally based on Git commands, so I found some way to upload and download files based on REST-API
refer to Rest Interface for git
IMHO, we could separate this issue into two parts:
For the first one, since resource center are based on HDFS / S3 / ..., we could add a log file and make it invisible for users in remote storage to store operation log / commit hash code, etc. and combine commit hash code with object name or tag. Use S3 / HDFS read/write interface to interact with this log file to ensure consistency. In that case, we could enable source control not only for txt / sql / sh file, but also for jar / tar and avoid exploding the remote git repo.
For the second one, we could add some kind of mapping function to map workflows into python DAGs. Users will get different versions of python DAGs once they create / update their workflows. Based on that, we could add source control with git protocol to enable users to review workflow changes and provide them with better production experience.
WDYT @complone @xtr1993 @SbloodyS @zhongjiajie @davidzollo
IMHO, we could separate this issue into two parts:
- Add source control ability for resource center.
- Enable users to review workflow changes.
For the first one, since resource center are based on HDFS / S3 / ..., we could add a log file in remote storage to store operation log / commit hash code, etc. and combine commit hash code with object name or tag. Use S3 / HDFS read/write interface to interact with this log file to ensure consistency. In that case, we could enable source control not only for txt / sql / sh file, but also for jar / tar and avoiding exploding the remote git repo.
For the second one, we could add some kind of mapping function to map workflows into python DAGs. Users will get different versions of python DAGs once they create / update their workflows. Based on that, we could add source control with git protocol to enable users to review workflow changes and provide them with better production experience.
WDYT @complone @xtr1993 @SbloodyS @zhongjiajie @davidzollo
To clarify, those generated python DAGs mentioned above are only for review purposes, we do not really need to run those DAGs. Therefore, there's no need to change current code logic and we may just add an assistant feature.
@EricGao888 Thank you very much for your reply, for the second point after the discussion with @davidzollo, the more demand in the community is based on the git protocol management. Usually this scenario is every time the user modifies the version of the DAG. I will take the time recently. Check out the running process of airflow to generate DAG management for better design version
@EricGao888 Thank you very much for your reply, for the second point after the discussion with @davidzollo, the more demand in the community is based on the git protocol management. Usually this scenario is every time the user modifies the version of the DAG. I will take the time recently. Check out the running process of airflow to generate DAG management for better design version
@complone Hi complone, thx again for your effort. If you could bring such feature into DS, it will be fantastic. Instead of understanding the running process of airflow, I suggest spending some time on the syntax of airflow DAG. Actually, we may not need to really run such DAGs generated from workflows. The main purpose is to help users review / give suggestions on the changes and python DAGs are easier to review than graphs.
@EricGao888 Thank you for adding. During this time, I will read the data structure of DAG in airflow, so that I can discuss with you later
@complone FYI, you may also refer to airflow-code-editor to see how workflow as code could be integrated with git.
@complone FYI, you may also refer to airflow-code-editor to see how workflow as code could be integrated with git.
Thank you very much for your help let me take a look first
@zhongjiajie do you have any ideas?
I have implemented this function and have demo code, we can discuss this function together; https://github.com/xtr1993/datacenter-git-client-demo.git
Thank you very much for the Git operation encapsulation logic you provide, will try to design logic compatible with dolphinscheduler
As far as I see, this issue is only talked about adding a new resource center implementation by git.
I didn't see any detailed design about how to manage the resource version, we need to store the resource version in our database?
And this issue doesn't talk about the detail of how to store the workflow in git(resource center), if we don't store the workflow in git, how can we review it? cc @davidzollo @caishunfeng
As far as I see, this issue is only talked about adding a new resource center implementation by git.
I didn't see any detailed design about how to manage the resource version, we need to store the resource version in our database?
And this pr doesn't talk about the detail of how to store the workflow in git(resource center), if we don't store the workflow in git, how can we review it? cc @davidzollo @caishunfeng
yes, as @EricGao888 said, https://github.com/apache/dolphinscheduler/issues/10387#issuecomment-1166904625 , I think splitting two issues will be better
Search before asking
Description
The current resource center supports hdfs/s3/local storage due to the way of uploading and reading files, only need to add git file storage
When a user uploads a file to the resource center to access
ResourcesController
, the implementation classHadoopUtils
of theStorageOperate
interface will implement file operations withS3Utils
The ecplise provides a Java client
org.eclipse.jgit
to support file storage based on I will compose the API-related storage operation implementation based on the production environment example here jgit-cookbookUse case
The following are two simple git manipulation examples, which will be further expanded in combination with jgit-cookbook
git create
git pull
Related issues
Nope
Are you willing to submit a PR?
Code of Conduct