cms-dpoa / cms-dpoa-getting-started

https://cms-dpoa.github.io/cms-dpoa-getting-started/intro.html
2 stars 2 forks source link

Rou'a - implement a workflow #78

Open katilp opened 3 years ago

katilp commented 3 years ago

Implement a two-step workflow in argo with

  1. use cernopendata-client to get a file with a list of files and their sizes of one open data record
    • use container cernopendata/cernopendata-client
    • note that when the container is run it executes a command which is defined as "entrypoint" when the container has been built, in this case, it is the command cernopendata-client
      • if interested you can see it here
    • this container is normally run with docker run -it --rm cernopendata/cernopendata-client <command arguments> but now you would need to open a bash shell in the container so that you can save the file listing in a file
    • you will need to substitute the entrypoint cernopendata-client defined in the container definition and substitute it with /bin/bash to open a shell, see an example of how to substitute "entrypoint" command in an argo workflow definition in https://argoproj.github.io/argo-workflows/fields/#usercontainer
    • see https://cernopendata-client.readthedocs.io/en/latest/usage.html#listing-available-data-files how to get the files and their sizes
    • use for example record http://opendata.cern.ch/record/6010
  2. read the file created in step 1 and divide the files into groups of roughly equal size.
    • use container python:3
    • give the size of groups as a parameter
    • write a python script to compute the total size and then divide the listing into groups (you can start simple and expect that the files have a roughly equal size)
    • these new listings are the output of this step

Check the argo examples for multi-step workflow in https://argoproj.github.io/argo-workflows/examples/#steps and for passing the parameters in https://argoproj.github.io/argo-workflows/examples/#parameters

You can pass the file from a step to another through a mounted volume as in #68

roaatamimi commented 3 years ago

for the record download gives this error

 docker run -it --rm cernopendata/cernopendata-client download-files --recid 6010
==> Downloading file 1 of 1404
  -> File: ./6010/00992A80-DF70-E211-9872-0026189437FE.root
Traceback (most recent call last): (33%)
  File "/usr/local/bin/cernopendata-client", line 11, in <module>
    load_entry_point('cernopendata-client==0.2.0', 'console_scripts', 'cernopendata-client')()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/cernopendata_client/cli.py", line 344, in download_files
    download_single_file(path=path, file_location=file_location, protocol=protocol)
  File "/usr/local/lib/python3.6/site-packages/cernopendata_client/downloader.py", line 109, in download_single_file
    c.perform()
pycurl.error: (18, 'transfer closed with 2184044197 bytes remaining to read')