Samcoodess / reana-dms

Implementing REANA workflow for galaxy rotation-curve fitting analysis (RCFM) | Dark matter searches
MIT License
0 stars 0 forks source link

Create reana.yml file #6

Open Samcoodess opened 1 year ago

Samcoodess commented 1 year ago

Hello mentors, I read the "Structure your analysis" part on REANA. I have reana.yaml file but it's throwing an error while I create a new workflow. Error ====> Cannot create a workflow

matthewfeickert commented 1 year ago

have reana.yaml file but it's throwing an error while I create a new workflow. Error ====> Cannot create a workflow

@Samcoodess Please link to the commit of the reana.yaml that you're using here, or paste it here with formatting. To be able to give input on things the code is necessary.

Samcoodess commented 1 year ago

Hello @matthewfeickert , I have linked the reana.yaml file committed to another repo "RCFM". I listed all the files in the format given in the documentation on REANA's website. To organize it better, I separated it with comments. It's still a demo format thus, I have commented out most parts. Q. Am I supposed to list every .dat, .pyc, .csv, .ipynb, . py files like this?

REANA.YAML - reana.yaml

Samcoodess commented 1 year ago

Hello, I hope you are doing well. @matthewfeickert Could you review my reana.yaml file once? The previous one had all the paths of the file copied but now, I tried storing everything in the directory. However, When trying to validate this using reana-client validate. It throws an error. | ==> ERROR: Something went wrong when trying to validate /Users/sambridhideo/dev/RCFM/reana.yaml

Could you connect me with someone who has worked with creating reana.yaml for the model containing many inputs and outputs ? The examples and the documentation don't explicitly have a guide for many files.

This is my draft currently ! _____

version: 0.9.1

inputs:
  files:
    - DataAid.py
    - DataImporter.py
    - Neros.py
    - Neros_test.py
    - rotCurve.py

  directories:
    - data_csv_files:  #contains all the .csv files
    - data_dat_files:  # Contains .dat Files used as input to model.ipynb
    - data_txt_files:  # .txt files
    - graphs:          # contains generated graphs output of model.ipynb
    - plots:           # contains few plots outputs of model.ipynb

environment:
  name: rcfm-env  

workflow:
  - name: data_import
    type: serial
    environment:
      name: rcfm-env
    commands:
      - python DataImporter.py
    inputs:
      - name: data_csv_files
      - name: data_dat_files
      - name: data_txt_files
    outputs:
      - name: imported_data

  - name: data_processing
    type: serial
    environment:
      name: rcfm-env
    commands:
      - jupyter nbconvert --execute --to notebook --inplace model.ipynb
    inputs:
      - name: imported_data
      - name: DataAid.py
      - name: Neros.py
      - name: Neros_test.py
      - name: rotCurve.py
    outputs:
      - name: analysis_results

  - name: generate_plots
    type: serial
    environment:
      name: rcfm-env
    commands:
      - jupyter nbconvert --execute --to notebook --inplace model.ipynb
    inputs:
      - name: analysis_results
    outputs:
      - name: plots
      - name: graphs
kratsg commented 1 year ago

The main issue is that workflow is a dicitonary, not a list of dictionaries. The list of dictionaries is under workflow.specification.steps I believe. There's also some other issues which I went ahead and quickly fixed so this passes validation

version: 0.9.1

inputs:
  files:
    - DataAid.py
    - DataImporter.py
    - Neros.py
    - Neros_test.py
    - rotCurve.py

  directories:
    - data_csv_files  #contains all the .csv files
    - data_dat_files  # Contains .dat Files used as input to model.ipynb
    - data_txt_files  # .txt files
    - graphs          # contains generated graphs output of model.ipynb
    - plots           # contains few plots outputs of model.ipynb

environment:
  name: rcfm-env

workflow:
  type: serial
  specification:
    steps:
      - name: data_import
        environment: rcfm-env
        commands:
          - python DataImporter.py
        inputs:
          - data_csv_files
          - data_dat_files
          - data_txt_files
        outputs:
          - name: imported_data

      - name: data_processing
        environment: rcfm-env
        commands:
          - jupyter nbconvert --execute --to notebook --inplace model.ipynb
        inputs:
          - name: imported_data
          - name: DataAid.py
          - name: Neros.py
          - name: Neros_test.py
          - name: rotCurve.py
        outputs:
          - name: analysis_results

      - name: generate_plots
        environment: rcfm-env
        commands:
          - jupyter nbconvert --execute --to notebook --inplace model.ipynb
        inputs:
          - name: analysis_results
        outputs:
          - name: plots
          - name: graphs
Samcoodess commented 1 year ago

I am currently debugging this issue after validating my reana.yaml file for running my workflow.

reana-client create -w rcfm export REANA_WORKON = rcfm reana-client upload reana-client start reana-client status reana-client logs ---------> GOT ERROR <----------

The error says,

Workflow exited unexpectedly 'environment'


matthewfeickert commented 1 year ago

@Samcoodess Please update your create_reana.yaml branch of your fork (https://github.com/Samcoodess/RCFM) with the code that you are actually using instead of just copy and pasting here.

---------> GOT ERROR <----------

The error says,

Workflow exited unexpectedly 'environment'

Is there any additional information in the logs? You should be able to view them on the REANA web portal.

Though

environment:
  name: rcfm-env

is not a top level key, so that should get removed and the environment should be specified in the steps.

At the moment your file is using the serial workflow system, which is fine, but the serial system expects that the environment is defined in a containerized environment (that is a Docker image that is publicly available somewhere).

So this Docker image would need to be created and have the environment that you setup in Issue https://github.com/Samcoodess/reana-dms/issues/5 and then published somehwere like Docker Hub to be used here as an environment option.

An unoptimized Dockerfile (for a scenario in which the environement.yml is in the same directory as the Dockerfile

$ tree .
.
├── Dockerfile
└── environment.yml

0 directories, 2 files

) would be something like

FROM mambaorg/micromamba:1.4.9-bullseye-slim as base

COPY --chown=mambauser environment.yml /docker/

RUN micromamba env create --yes --file /docker/environment.yml

# The mambaorg/micromamba base image's entrypoint is
# /usr/local/bin/_entrypoint.sh which ensures the shell environment is
# correctly set for micromamba to be accessible by the given user.
# c.f. https://github.com/mamba-org/micromamba-docker/blob/604ebafb09543a3d852e437886f1c782f0367911/_entrypoint.sh
# so set ENV_NAME to be the same as the environment created from /docker/environment.yml
# so it will get activated on startup
ENV ENV_NAME=rcfm-analysis

which when built

docker build -f Dockerfile -t example-image:your-tag-goes-here .

has the environment activated and ready for use

$ docker run --rm -ti example-image:your-tag-goes-here
(rcfm-analysis) mambauser@29494250d3af:/tmp$ micromamba env list
  Name           Active  Path                         
────────────────────────────────────────────────────────
  base                   /opt/conda                   
  rcfm-analysis  *       /opt/conda/envs/rcfm-analysis
(rcfm-analysis) mambauser@29494250d3af:/tmp$

Could you connect me with someone who has worked with creating reana.yaml

In general, for questions about REANA the best place to ask for now is on the REANA Forum.