Tauffer-Consulting / domino

User friendly and open source platform for workflow creation and monitoring
https://domino-workflows.io/
Apache License 2.0
128 stars 11 forks source link

The workflow did not pull the remote image to create a docker container when it was running. #271

Closed lanzhixi closed 4 months ago

lanzhixi commented 5 months ago

I successfully imported the piece I made into the platform and created the workflow. An error occurred during runtime. The rest log did not report an error message. However, after troubleshooting, I found that the piece I made did not create a docker container (image publish was successful). So I copied ml_domino_pieces , and only modified the REGISTRY_NAME in config.toml to be my own account name (this is just to ensure that the publish image is successful). The rest of the places were not modified. After importing to the platform, it is still the same. No docker container is created when running the workflow. No errors were reported throughout the process, but the docker container did not run and the cloud image was not pulled. What could be the reason? Are there any other operations required to import the piece I made to work properly? Thanks for your answer!

vinicvaz commented 5 months ago

Hey @lanzhixi, sorry you are facing issues in creating pieces, I'll try to help you to debug what is happening.

  1. What environment are you running Domino? Are you running locally using docker compose?
  2. When you run the workflow, what happens with the piece and with your workflow? Do they go to the failed state or they get stucked in running or other state? Can you send me a screenshot of your workflow after running it?
  3. Your ml_domino_pieces is just a copy of ours, right? If so, can you make it public for a while so I can try to reproduce it ?
lanzhixi commented 5 months ago

Hey @vinicvaz ,Thank you for your reminder, I will standardize my questioning method next time. The latest attempt is to successfully run the piece I made. The reason is that I am used to naming with uppercase letters, and in order to make the image build successful, all lowercase letters are filled in config.toml. This may cause the platform to not find the corresponding image based on the repository name at all, so it appears that the cloud image is not being pulled.

luiztauffer commented 5 months ago

thanks for spotting the issue @lanzhixi ! Can you confirm you're now able to run your Pieces?

lanzhixi commented 5 months ago

@luiztauffer , @vinicvaz Thank you for your continued attention. I am sure that if I don't make the mistakes I made before, I can successfully create the piece and successfully import the domino to run. This is so cool! But in my recent attempts, I have encountered some problems (the image can be pulled normally, but there are problems running), and I would like to ask you for help. image

After successfully trying to use the template, my confidence increased and I started to create my own lightGBMPiece. However, the lightGBMPiece I wrote had problems running. After adding tests, I found that the problem was not in the code. I'd like to ask you to help me take a look.(I think I may have missed something, but I checked it many times and it is consistent with the format of the case.) image

thanks!

vinicvaz commented 5 months ago

Hey @lanzhixi , thanks for reporting this. Yes, you are right, the tests will not run for your piece, and there are few reasons why:

  1. I think there is a missing argument in your lightgbm dockerfile, you should include RUN apt-get install libgomp1 . I tried to run it locally after building and got the error:

    [ImportError: libgomp.so.1: cannot open shared object file: No such file or directory](https://stackoverflow.com/questions/43764624/importerror-libgomp-so-1-cannot-open-shared-object-file-no-such-file-or-direc)

    This command should fix it.

  2. Not sure but I think you have to fix some things in your code, like the missing evaluation data, which I Think is required for early stop

  3. The third thing is our fault. The way Domino runs tests on github environment still a bit tricky, I'll try to explain here what and why the things happen. First lets imagine a scenario with multiple pieces and a lot of dependency conflicts in a same repository. For this scenario, installing all the pieces dependencies in the github actions environment would be impossible, so to avoid that we indeed build the docker images in github actions environment and and run each image independently, where each image listen to the tests env. Basically what we are doing is separating the tests and piece code environment, where tests are running in github actions root environment and the piece code will run in a docker container running inside the github actions root environment. The way we do that is basically done in 3 steps:

    • First we build all the images and save a map for each piece name and corresponding image. Example: LightGBMTrainPiece: ghcr.io/lanzhixi/piece_test:0.1.5-group0
    • Based on the piece name defined on the test we run your built docker image starting a really tiny HTTP server in your piece container. This HTTP server will listen the request from the piece_dry_run function, pass it to your piece_function and return the results to the test function.

This is the way we've found to run the pieces in their isolated environment, the problems with that are:

Additional information / Alternative Solution

@skip_envs('github') def test_my_piece(): ...


This might be useful for tests that you can't run yet in github actions but want to run locally, like [here](https://github.com/Tauffer-Consulting/openai_domino_pieces/blob/main/pieces/AudioTranscriptionLocalPiece/test_localtranscription_piece.py).

I know this is a lot of information to get so if you have any questions feel free to send it here.
lanzhixi commented 4 months ago

Hey @vinicvaz ,Thank you very much for your detailed answer, it's very useful to me! I successfully ran my piece by modifying the dockerfile. 屏幕截图 2024-04-10 172303 But I had to add the skip_env decorator to the test file to skip the test, otherwise the test would still report an error. image That's all for now. I will continue to improve lightGBM related pieces in the future. If necessary, I will be happy to submit them to the ml_domino_pieces repository. Thanks!

luiztauffer commented 4 months ago

awesome job @lanzhixi! let us know how your ML pieces evolve, we would be happy to integrate them into existing repos or to juts list them in the open source gallery!

I suggest you also take a look into how to display results from your piece at the GUI, it can be very useful for ML pieces such as LightGBM: https://docs.domino-workflows.io/pieces/create_pieces#piecepy Example of a piece producing a Plotly image: https://github.com/Tauffer-Consulting/ml_domino_pieces/blob/main/pieces/PCAInferencePiece/piece.py