elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 344 forks source link

Elyra project consistency among all repositories #1532

Open lresende opened 3 years ago

lresende commented 3 years ago

When we look into other projects, JupyterLab being one close to us, we see that they are using a consistent set of tools across all repositories and/or subprojects.

Some examples are:

The benefit of being consistent and adopting a set of standard tools and processes across all repositories is that a new contributor will feel much more comfortable when moving from different components in the project, as he's already familiar with all tools and processes.

On the other hand, people might say that components might be using different languages or other things, and I believe in this case we can still wrap the native way to provide the same consistent set of tools across the different components (e.g. a make command wrapping the npm commands).

I would like to use this issue to gather feedback from the other community members and make sure we have a consensus on this topic and them we start enumerating what needs to be changed.

bourdakos1 commented 3 years ago

I agree that consistency across our Elyra repos is a good thing, but I don’t think we can ignore the notion of the native way. It could be argued that since this project focuses on building AI/ML extensions and that Python seems to be the leading language for this field, that we should use a Python focused toolchain as the overarching structure for all of our projects. However, I also believe that community members will lean towards making contributions in areas where they are the most comfortable. For example, my expertise is React, so I normally try to keep my contributions to the React/frontend areas. With that in mind, I personally think it would be easier for a React developer to make a contribution to the React portions of the code if we follow the React community standards or the native way. With a custom standard/wrapper we would be guarantee developers having to learning a new system in order to have the appearance of consistency. This doesn’t mean that our processes (issues, PRs, releases, etc…) shouldn’t be the same. I think it might be helpful to breakdown some of the different repos we have to find patterns and establish a standard for that project type. I personally only interface with elyra, pipeline-editor, vscode-pipeline-editor, pipeline-browser-extension and I’m aware of canvas, but haven’t contributed to it.

elyra

pipeline-editor

vscode-pipeline-editor

pipeline-browser-extension

canvas

Project structure

For the most part, I think project structure is fairly consistent (canvas being a wild card)

Script running

Again canvas is a bit of a wild card, so I’ll keep it separate

Canvas:

Everything else:

Looking at JupyterLab, they seem to just use npm scripts, with a scripts directory containing various python and shell scripts.

Proposal

I think we should follow the pattern of having a scripts folder in all projects, and it can contain any type of script that best suits the need.

We should have a single script runner per project (could be make or npm scripts) but no mixing. Either all commands are in the makefile or all the commands are in the package.json

If the project is 100% js it MUST be npm scripts If the project is 100% python — ??? I have no idea, python people can decide If the project is a hybrid I’m leaning on npm scripts, because that is what JupyterLab does

Build

The actual build is very dependent on the project so there’s not much to say here except that a project should be able to be built by just running yarn build or make build or whatever we decide on for script runners.

React app - webpack JupiterLab TypeScript extensions - jlpm TypeScript React component package - microbundle/rollup TypeScript library package - tsc Node server - tsc

Release process

I’m not super aware of the current release process, but it seems like Luciano makes 3 commits directly to master to:

  1. Update the changelog
  2. Bump version and tag release
  3. Bump version again with a -dev appended

I’m personally not a huge fan of pushing the version bumps and changelog directly to master. I think that we should move to opening a release PR that contains the changelog and version bumps (anyone can open this PR). Once approved, either ci handles publishing and tagging the release or we can continue to have someone doing it manually (I’m fine with either). I think the main point here is that the changelog should be reviewed.

Lint tools and configuration

lresende commented 3 years ago

Again canvas is a bit of a wild card FYI @vlad-bunescu, once we agree on some standards, we will discuss how we could implement this for canvas.

We should have a single script runner per project (could be make or npm scripts) but no mixing. Either all commands are in the makefile or all the commands are in the package.json

I am not sure I understood this properly but assuming we have a multi-project repo, an npm clean for project A, might be different from clean for project B. In this case, make clean delegating to npm clean should be ok. I would actually say that we should adopt make as the script runner as this would simplify the non-experts in one area to be able to easily build any component (e.g. a react developer to be able to build/deploy Elyra main project)

The actual build is very dependent on the project so there’s not much to say here except that a project should be able to be built by just running yarn build or make build or whatever we decide on for script runners.

Yes, based on the above, something like make build or make install

I’m personally not a huge fan of pushing the version bumps and changelog directly to master. I think that we should move to opening a release PR that contains the changelog and version bumps (anyone can open this PR). Once approved, either ci handles publishing and tagging the release or we can continue to have someone doing it manually (I’m fine with either). I think the main point here is that the changelog should be reviewed.

The changelog is independent of the release, and it's pretty much traversing the git history to build a list of changes since the last release. The other steps, are pretty much done by the release script by running two commands: release prepare that does everything locally, and release publish that pushes the local changes to the proper repos.

All js projects should use the same eslint config and the default prettier settings not sure about python

Agree, is there an easy way to maintain these type of details in sync across multiple repositories? The other thing, particularly to Python, is how to handle different IDEs, as I believe folks that use VSCode might have different version of linters preconfigured compared to other IDEs like PyCharm.

bourdakos1 commented 3 years ago

I am not sure I understood this properly but assuming we have a multi-project repo, an npm clean for project A, might be different from clean for project B. In this case, make clean delegating to npm clean should be ok. I would actually say that we should adopt make as the script runner as this would simplify the non-experts in one area to be able to easily build any component (e.g. a react developer to be able to build/deploy Elyra main project)

I personally don't think there's much of a learning curve from the script running perspective. Running make build vs yarn build doesn't matter much. What was confusing to me, when I first started contributing to Elyra, was "which do I use make xxx or yarn xxx?". Since there were commands in both places I didn't really know which ones I should be using. So if we do use make I would say do not put any scripts in the package.json.

Given that JupyterLab uses npm scripts as their script runner for their hybrid project, I would vote for something like this: Project Type Script Runner
Python make?
JS npm scripts
Hybrid npm scripts

A pattern I've also seen is creating an xxx-scripts npm package to shares scripts across projects. So we could create elyra-scripts and add "release": "elyra-scripts release" to our package.json. However, I don't know if we have enough shared scripts for this to be worth it.

The changelog is independent of the release, and it's pretty much traversing the git history to build a list of changes since the last release. The other steps, are pretty much done by the release script by running two commands: release prepare that does everything locally, and release publish that pushes the local changes to the proper repos.

I'm not sure I understand what you mean by "independent of the release". It seems like most releases look like these three commits:

What I'm saying is the Update changelog for release X.X.X and Release vX.X.X commits should be combined and should be approved before merging and publishing any artifacts. I think this is a fairly common practice? It gives us a chance to make suggestions for the highlights of the release like adding a screenshot, fixing typo etc... This is also where people could do some manual testing to make sure everything is working as expected.

When we are ready for release someone will run the "prepare" scripts, which will bump versions and generate a changelog "template". The template is just a git log with a space for highlights. Here is an example template from running yarn release:changlelog from https://github.com/elyra-ai/pipeline-editor/pull/99:

## 0.1.0 (Apr 7, 2021)

High level enhancements
- TODO HIGHLIGHTS

Other enhancements and bug fixes
- Use css-in-js for styling ([#67](https://github.com/elyra-ai/pipeline-editor/pull/67))
- Update to canvas version 10.3.0 ([#87](https://github.com/elyra-ai/pipeline-editor/pull/87))
- Use a real "back" button in the SuperNode UI ([#77](https://github.com/elyra-ai/pipeline-editor/pull/77))
- Enhance GitHub action performance([#68](https://github.com/elyra-ai/pipeline-editor/pull/68))
- Enable customization with a new onDoubleClickNode handler ([#76](https://github.com/elyra-ai/pipeline-editor/pull/76))
- Update Pipeline node dynamic label logic ([#75](https://github.com/elyra-ai/pipeline-editor/pull/75))
- Minor Node UI improvements ([#73](https://github.com/elyra-ai/pipeline-editor/pull/73))

This would be filled out and manually prepended to the changelog. They would then open a PR, it would be reviewed and as soon as it's merged it should be built/published/tagged

Agree, is there an easy way to maintain these type of details in sync across multiple repositories?

Yes, for eslint I can create an eslint config npm package for us if you want

lresende commented 3 years ago

I am not sure I understood this properly but assuming we have a multi-project repo, an npm clean for project A, might be different from clean for project B. In this case, make clean delegating to npm clean should be ok. I would actually say that we should adopt make as the script runner as this would simplify the non-experts in one area to be able to easily build any component (e.g. a react developer to be able to build/deploy Elyra main project)

I personally don't think there's much of a learning curve from the script running perspective. Running make build vs yarn build doesn't matter much. What was confusing to me, when I first started contributing to Elyra, was "which do I use make xxx or yarn xxx?". Since there were commands in both places I didn't really know which ones I should be using. So if we do use make I would say do not put any scripts in the package.json.

That's the thing, the script runner standardized the commands across all repositories, independent of what technology background you have. So, independent if I have a Python background or a Typescript background on all repositories something like make install will build/install. Now, if you know your ins/outs you can always use the internal commands that make is calling, like python setup.py bdist_wheel sdist or yarn install

lresende commented 3 years ago

What I'm saying is the Update changelog for release X.X.X and Release vX.X.X commits should be combined and should be approved before merging and publishing any artifacts. I think this is a fairly common practice? It gives us a chance to make suggestions for the highlights of the release like adding a screenshot, fixing typo etc... This is also where people could do some manual testing to make sure everything is working as expected.

bourdakos1 commented 3 years ago

I would agree with the changelog being open for review, as long as it's don't block the release.

👍

As for "script generated" changes, I really don't see a big difference in reviewing the changes, as it's always going to be following the same pattern and the script has been reviewed.

I agree, I'm fine with 100% script generated changes going straight to master. However, I think the only fully script generated changes we are talking about here would be the version bumps, which I think should be a part of the "release PR".

I would say I'm flexible here, but I can't really think of a way where we could have a changelog PR without having the version bumps part of it.

As for testing, the somewhat exactly same code is sitting in master waiting to be tested with a list of suggested tests or areas to test, and do you really think that making a pr will make any difference?

Right now, we can internally announce, "hey, master is ready for a release go test it out", but I think if we have a release PR, it gives community members a chance to be a part of the process

bourdakos1 commented 3 years ago

That's the thing, the script runner standardized the commands across all repositories, independent of what technology background you have. So, independent if I have a Python background or a Typescript background on all repositories something like make install will build/install. Now, if you know your ins/outs you can always use the internal commands that make is calling, like python setup.py bdist_wheel sdist or yarn install

I really don't think having two script runners is an issue as long as the usage is consistent across project types. I don't think someone with a Python background will have trouble running yarn install vs make install

kevin-bates commented 3 years ago

I like the idea of a "release PR". FWIW, there's growing traction with github-activity which produces super nice changelogs that can then be adjusted (although I just read Nick's comment above showing the yarn command). Also, there's work being done on tooling that produces a "release PR" via GitHub actions and, once merged, the actions then trigger PyPI uploads, etc. So I think we might want to adopt that tooling once available. The gist of it though is that it works through PRs that are then reviewed, iterated, and merged (which triggers the next step).

I also wouldn't say make is used natively by Python devs. Within Jupyter, it's actually rare. However, I think what's more important is to have clear (and concise) documented commands that contributors can follow. When new to a repository, I hope to find some information (or link) on the README that points me to set up and build instructions that work.

On a personal level, I like make. I often use make -n target to see what a given target is going to do before it runs, and I like how parameters can be fed on the command line to adjust default behavior. But it's really more of a comfort thing. I actually find python setup.py yada yada fairly gross and have created a simple alias to that command that essentially works in the various other jupyter projects.

bourdakos1 commented 3 years ago

However, I think what's more important is to have clear (and concise) documented commands that contributors can follow. When new to a repository, I hope to find some information (or link) on the README that points me to set up and build instructions that work.

❤️ yes, agree 100%

I'm okay if we decide to use make for all projects. But I really do think make feels out of place in a 100% javascript project. Either way, as long as things are well documented I don't think that we will have any real issues (make for everything or consistency dependent on project type)

bourdakos1 commented 3 years ago

@lresende what if we use a makefile and then in the npm scripts only allow using the make commands?

like:

"scripts": {
  "build": "make build",
  "test": "make test"
}
lresende commented 3 years ago

Not sure I understand... looks more like infinite loop...

bourdakos1 commented 3 years ago

the npm scripts would just be an alias for the make targets. The makefile wouldn't use any of these npm scrtips

bourdakos1 commented 3 years ago

the exception would be that the packages/*/src/packages.json's would still have npm scripts so that we you run yarn lerna run build from the makfile