DiSSCo / SDR

Specimen Data Refinery
Apache License 2.0
6 stars 0 forks source link

Generate issues from SDR PoC release #47

Closed benscott closed 2 years ago

benscott commented 2 years ago

Thoughts after SDR release - for discussion.

Should image download be a seperate tool - relies on users to add it and subsequent steps won't function without a downloaded image.

json-schema is included in config directory (and set as environment variable). Another version of json-schema is in validate-opends. code.py has code to validate schema. Need to concolidate these - and just use one programming language. Ruby?? The code.py & utils.py could be released on pip - is their equivalent for Ruby?

I have added environment variables in paul's ~/.bashrc. These need documented - and site moved outside of paul's personal account. Keep forgetting to switch user before running scripts, which breaks everything.

benscott commented 2 years ago

Tool code block is deprecated - https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-code.

So code.py / utils.py can be migrated to Ruby & implemented in the new tool model.

benscott commented 2 years ago

Ruby dependencies (e.g. fastimage): need to be recorded and included in the installation/readme.

llivermore commented 2 years ago

We need to discuss a few issues:

10

PaulBrack commented 2 years ago

I think this should be a separate tool, but only available to the user through a sub-workflow that links both together. That was we get the advantages of modularity with them being separate tools, but without the potential problems of not having them linked together

benscott commented 2 years ago

Another point for discussion: at the moment, every tool has openDS as input and code.py maps these to opends_properties. The drawback of this, is that every input variable needs to be munged into the openDS structure.
Take for example, GEORG: this has locality text string input. For this to be run as a standalone tool on a spreadsheet of data, the spreadsheet would need to contain (or be converted to) openDS objects.

Would it be better for tools to accept the inputs they actually require? Instead of each tool converting the openDS into the inputs, we would have instead an openDS mapper tool, which would take the openDS input, define the expected outputs which would be plucked from the openDS JSON, and then feed these into the subsequent tool.

So rather than openDS => tool we would have openDS => openDS mapper => tool.

Advantages: more in line with the way Galaxy is designed to be used. Also, there is a lot of code redundancy - each tool needs code.py to extract the openDS properties. Instead, there would be one tool that did this.

PaulBrack commented 2 years ago

Closing - have created issues 54 to 59