pip install FAIRWorkflowsExtension does not work

vemonet commented 4 years ago

The README instructs to install the pip package using pip install:

pip install FAIRWorkflowsExtension

But the package does not seems to have been published to pypi:

pip install FAIRWorkflowsExtension
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement FAIRWorkflowsExtension (from versions: none)
ERROR: No matching distribution found for FAIRWorkflowsExtension

Note that I am using python 3.6/3.7 on Ubuntu

Will the package be published? Otherwise the README should provide instruction to build the pip package (e.g. pip install .)

Ideally I would like to be able to install the extension using:

pip install FAIRWorkflowsExtension
jupyter labextension install FAIRWorkflowsExtension

I can give a hand with setting up GitHub CI/CD to automatically test, build and publish at each new release (github tag), let me know if you are already set or if this could be helpful

raar1 commented 4 years ago

Hi @vemonet, thank you for reporting this. Yes, apologies for the confusion - the extension is not yet on pypi as it was thought to be too early in its development. However, perhaps it is easier for everyone if we publish it there, as you suggest. The README should have said that the pip install FAIRWorkflowsExtension was to be run from the root of the repo. I've updated the README to say this now.

Our intention is mainly to have the extension working from within a docker container. If you execute docker-compose up from the root of the repository then that should (hopefully) build and run the extension for you. Would you be able to try this and see if it works for you? I think it may be easier to ensure it is set up correctly that way.

vemonet commented 4 years ago

Hi @raar1, thanks I will try your docker-compose tomorrow

Usually how I use Jupyterlab extensions is that I install the pip package and labextension in my image (so that I can install all the extensions I want in my jupyterlab) For example like this: https://github.com/vemonet/Jupyterlab/blob/master/Dockerfile#L36

Maybe I could install it in my Dockerfile based on your Dockerfile:

RUN pip install git+git://github.com/fair-workflows/FAIRWorkbench@master
RUN pip install -e .
RUN jupyter-serverextension enable --py FAIRWorkflowsExtension && \
    jlpm && jlpm build && jupyter-labextension link . && jlpm build && jupyter-lab build

Is the pip install -e . really required if you are already installing using pip install git+git://github.com/fair-workflows/FAIRWorkbench@master ?

If you want to publish to pypi automatically without a hassle you can easily set the workflow via GitHub Actions. Especially that you seem to have everything already well setup!

This workflow for example automatically run pytest on every push to master, and it build + publish only for tags (so release) and if tests pass: https://github.com/MaastrichtU-IDS/d2s-cli/blob/master/.github/workflows/python-publish.yml#L41

Then you just need to remember to increase the version number when you push a new tag/release

If you don't want to push to such an official repository at the moment, you can do it to https://test.pypi.org/ (it is the same principles with an extra argument)

vemonet commented 4 years ago

It worked with docker-compose, I fixed the nanopub and workflowhub imports No reference issues when trying to directly use the Extension:

from fairworkflows import Nanopub, Workflowhub

But still don't know how to resolve @FairStep Running:

@FairStep(fw)
def mult(walrus, bird):
    """
        Multiply two integers together (walrus and bird).
    """
    result = walrus * bird
    return result

Getting:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-31-8c85294b569c> in <module>
----> 1 @FairStep(fw)
      2 def mult(walrus, bird):
      3     """
      4         Multiply two integers together (walrus and bird).
      5     """

NameError: name 'FairStep' is not defined

vemonet commented 4 years ago

It can be installed from the GitHub master branch in any JupyterLab Dockerfile easily:

RUN pip install git+git://github.com/fair-workflows/FAIRWorkbench@master
RUN pip install git+git://github.com/fair-workflows/FAIRWorkflowsExtension@master
RUN git clone https://github.com/fair-workflows/FAIRWorkflowsExtension /root/FAIRWorkflowsExtension
WORKDIR /root/FAIRWorkflowsExtension
RUN jupyter-serverextension enable --py FAIRWorkflowsExtension 
RUN jlpm && jlpm build && jupyter-labextension link . && jlpm build && jupyter-lab build

Working with my build: https://github.com/vemonet/Jupyterlab/blob/master/Dockerfile#L47

raar1 commented 4 years ago

Ah, thanks a lot for trying all this stuff out. I'm glad there is a way to integrate it into your docker image without too much trouble, but I may look at the test.pypi.org solution too as you suggest.

With regards to the

@FairStep(fw)
def mult(walrus, bird):

etc., did you get this by using the 'Step' search functionality? These are actually old test 'steps' that were nanopublished back using a very early version of the library, when we had a FairStep decorator, so these steps will not execute any more. Hopefully soon there will be real workflows uploaded that are functional. This feature is still quite new so sorry about the confusion there.

Is there anything in particular you are interested in using the extension for that we could help with? It looks like you are collecting together a few different datascience related extensions into a single image? We would be more than happy to discuss any changes/features that could make integration with the rest of that system better.

vemonet commented 4 years ago

Yes, it is a Jupyterlab image we use at our Institute with root user and common useful package/extensions for Data Science (we are up to JupyterLab 2.0 and integrated the linting and autocomplete tool, it makes JupyterLab a full blown IDE)

At the moment I would say that I am interested in the 2 parts:

Describing a workflow as FAIR through the extension
- How do you do it? e.g. do you plan to generate CWL from Python functions in the notebook? Or just annotate using RDF?
- Which workflow formats do you support? CWL, nextflow, any CI/CD YAML (github actions, gitlab CI, travis CI)?
- Which ontology to describe the workflow?
- As a end-user I would like to be able to easily annotate any of the previously mentioned formats or regular python functions in my notebook used to compose the workflow. Annotation would just go through adding RDF metadata in the workflow and using existing ontologies. I know that CWL already allows you to add any RDF anywhere in your workflow, with some guidelines for which concepts to use (e.g. EDAM to describe tools functions)
The nanopublishing part on its own is also interesting, even without FAIR workflows
- How do you nanopublish from the Jupyter notebook? Did you implemented a way to generate signed nanopubs using javascript/python? (the original implementation is in Java)
- I could be interested to use this extension just to search for nanopublications, then explore/parse it using python, and compose/publish new ones (without necessarily being about FAIR workflows).

Thanks for the answers, we will keep an eye on the evolution :)

tkuhn commented 4 years ago

I can answer the nanopublication questions and let @raar1 answer the other points:

* The nanopublishing part on its own is also interesting, even without FAIR workflows

  * How do you nanopublish from the Jupyter notebook? Did you implemented a way to generate signed nanopubs using javascript/python? (the original implementation is in Java)

We are calling the nanopub-java library through a command-line call in Python. It would be very nice to have a native Python library for this, but it's not our highest priority. nanopub-java has lots of features by now, and it would take quite a bit of an investment to replicate all that in Python. It's a bit of a hack but it works well enough for now (but @raar1 might disagree :) ).

  * I could be interested to use this extension just to search for nanopublications, then explore/parse it using python, and compose/publish new ones (without necessarily being about FAIR workflows).

Yes, that would indeed be nice, if this part could be reused in other contexts. It should be fairly straightforward (but again @raar1 might disagree).

raar1 commented 4 years ago

As @tkuhn mentions, we are actually using the nanopub-java code for the publishing functionality at present. However, it's wrapped by our python library in such a way that it is 'waiting' for a python version of nanopub, that could be slotted in if one were ever available. At present our library ships with the np commandline-tool that downloads the necessary .jar file automatically, so we haven't had any issues so far. I would of course prefer a python lib for nanopublishing too, but as Tobias notes it could be a lot of extra work to duplicate the java version at this stage.

With regards to the nanopublication real-time search/publish functionality, I think it would be pretty simple to separate that out into its own self-contained 'nanopub JupyterLab extension'. Then that interactive widget could be used in other setups too (without necessarily needing the fairworkflows stuff there as well). I will work on that this week or next and let you know the repo of the new extension as soon as it's up.

With regards to the 'type' of fair workflow, we are at present just describing a general python notebook using the Plex ontology https://github.com/perma-id/w3id.org/tree/master/fair/plex that Tobias et al. have been developing, storing steps and workflows as nanopublications. This is partly driven by the need to include 'Manual steps' in some fashion, which is hard to find support for in other workflow formats, including CWL (the creators are not keen to have anything like that in the spec at present).

That said, I feel that using CWL to describe a workflow is closest to a 'FAIR' workflow format that the community would use. I intend to find a way to integrate/interoperate with a CWL Jupyter kernel that is currently in early development: https://github.com/giannisdoukas/CWLJNIKernel

The ideal situation would be to somehow marry the Plex ontology description, the nanopublishing model, the mixture of manual and computational steps, and CWL description of the computational workflows. The best way to do this is not yet completely clear.

We do intend to provide some sort of 'annotation helper' functionality to the FAIRWorkflows widget, to aid in adding appropriate extra RDF to the workflow. @vemonet do you happen to know if there is already such an extension? It would be best that the user does not really need to know much about RDF, or have to add the triples themselves (this is unlikely to encourage much annotation to be added).

vemonet commented 4 years ago

Thanks for the details @raar1 !

To be honest, I would have taken the same approach of starting using the jar, then evaluating the effort required for a fully python package

Not sure if this is what you want to do, but you can technically add any RDF property you want to a CWL workflow (imported from existing ontology) Example for

I am using additional properties recommended by CWL, Dockstore and Elixir BioTools. Which uses schema.org, EDAM, foaf CWL validate the properties you provide to make sure they are present in the imported ontology, so you should just need to import the Plex ontology (in OWL format) under $schemas: and start using your predicate to annotate different part of the workflow

Thanks for the CWL Jupyter Kernel link! I did not know about it. There are also integrations in development for Apache Airflow, Galaxy and Kubernetes (CWL Calrissian)

I don't know of a specific extension to help RDF annotation, you are looking for some kind of a GUI form which will generate triples? An option would be to build a RML query from a GUI form and run the rmlmapper (java) to generate RDF. But the fact it is java might make it heavy to run

raar1 commented 4 years ago

Hi @vemonet,

I've now separated out the simple Nanopub search/publishing functionality and put it in its own repo here: https://github.com/fair-workflows/NanopubJL

It's quite basic right now, but I'll be adding features to it (different kinds of searches, etc, based on the grlc API). If you want a particular feature, or see a way it could be more useful, then please feel free to open issues etc. I intend to work on it a little more so that it is completely standalone from any FAIR Workflows work, although it will continue to be used in this project too.

Yes, adding the RDF straight into CWL is what I would intend to do if I can find a way to make the CWL more palatable in notebooks. Experimentalists who compose largely manual workflows (generally speaking) might be expected to accept some amount of coding, in e.g. python, for analysis purposes, but they will most likely be put off writing CWL yaml directly into notebook cells. But with a bit more friendly UI, it could work, so I'm still thinking about it.

vemonet commented 4 years ago

Hi @raar1 thanks a lot! I will give it a try as soon as possible

vemonet commented 4 years ago

I agree for writing bare CWL, the CWL language has been originally built to represent workflow designed with the Rabix UI

You might want to take a look at it: https://rabix.io/ (but it would get the workflow design out of the Jupyter notebook)

You also have the Rabix Benten extension for VisualStudio code: https://github.com/rabix/benten/ It is built in python so you might be able to reuse the linting and autocomplete tools in jupyterlab

raar1 commented 4 years ago

Yes, I agree completely. We have been looking into using Rabix too, as it is the easiest way to define and represent the dependencies in the workflow.

However, I think we have a somewhat different focus than you usually find with purely computational workflows, since we are trying to describe (and 'execute') workflows that have a lot of entirely manual steps. This means they are purely described by the RDF description of that step, and any output data such a manual step needs to have produced. Being able to compose these manual steps, and then 'execute' them (this necessarily has an interactive component to it), is what is causing the difficulty of picking from existing CWL tools that generally assume purely computational steps.

Perhaps if it is not difficult to create plugins for Rabix then that could be one way to solve this?

Thanks for the Rabix Benten extension suggestion by the way - I didn't know it was in python, so we could look into that. It's also something we might point the developers of the CWL Kernel to.

raar1 commented 4 years ago

Just to further the point above on the mix of manual/computational steps - I suppose our ideal solution would be one where experimentalists (particularly those who have minimal coding skills) could compose entirely manual workflows, 'execute' them, and publish them (especially as CWL, extended or otherwise).

I'm thinking of this project perhaps more as 'mostly manual workflows with a few computational steps' rather than the other way around. This is why I want to avoid users needing to type out raw CWL yaml, as they would most likely simply not do this. CWL can be the format it is stored as, and what is used for execution, but ideally the user does not need to see too much of it. Rabix would obviously help a lot with that, but as mentioned above it's not obvious if the manual steps fit in there well at present.

raar1 commented 4 years ago

@vemonet One last thing you might also be interested in (that is related to the CWL kernel devs) might be this tool that converts a python notebook to cwl, providing the right type hints are provided: https://github.com/giannisdoukas/ipython2cwl. I think it's quite recent though.

fair-workflows / FAIRWorkflowsExtension

pip install FAIRWorkflowsExtension does not work #9