cancerDHC / operations

for operational functions
1 stars 1 forks source link

1b4: Investigate tool and data workflow landscape #8

Closed jmcmurry closed 3 years ago

jmcmurry commented 4 years ago

Investigate existing tools and strategies for achieving annotation, validation, transformation, and provenance services.

Due month 6

Description: The goal of this task is to understand and document for each node what tools and methods are being used for data ingestion, annotation, validation, transformation and for recording provenance. We are asking question in each of the interviews. In addition, it would be helpful to also understand any community resources that could be leveraged either directly by the node or as a component within the CCDH tools workstream, should we find significant gaps or opportunities for harmonization within the nodes. The goal is NOT to perform any sort of community-wide exhaustive inventory of metadata authoring tools.

jmcmurry commented 4 years ago

HOT group has developed the matrix here and will continue to evaluate. Aside from declaring this done at some point, there is no additional work needed for terminologies. There may be higher level tooling (eg for richer annotation for submission) however these are unlikely to be in place yet at the node level.

nicolevasilevsky commented 4 years ago

what data manipulation, annotation, validation, transformation, and provenance tools is each node using, if any. We have been asking about this in the node interviews, so this information should be in the notes (or if there is no info, then that we can interpret it as they are not currently using any such tools)

mellybelly commented 4 years ago

Note that the HOT landscape analysis of tools/resources is a CD2H/HOT deliverable. Relevant here are whether or not there are any functions or tools/resources that can be repurposed within the context of CCDH terminology services and content development needed by the CRDC. Also I think the HOT landscape analysis is not part of this ticket, which is about data landscape in each node.

decorons commented 4 years ago

Also, part of the Landscape analysis - upcoming meeting with CBIIT Semantic Infrastructure group on Tuesday, to learn about Ptolemy and CEDAR work which could be relevant.

balhoff commented 4 years ago

@gaurav how should we start capturing this? Incorporate into the requirements synthesis spreadsheet, or separately?

nicolevasilevsky commented 4 years ago

I'm in favor of a new tab in the Requirements Synthesis spreadsheet - fewer google docs!

gaurav commented 4 years ago

It might make sense to just add additional rows to the requirements synthesis spreadsheet for our needs, such as:

Given #12 and #22, we might want to have a separate tab on that spreadsheet that proposes tools that could be added to the portal, for example:

Tool Meets use-cases Requested by node Computing needs needed
k-BOOM Harmonizing ontologies None as yet Can be run locally or on a server
... ... ... ...

If you like this idea, please go ahead and add the new rows to that table. We can edit them by consensus there.

balhoff commented 4 years ago

We've started a tools landscape doc here: https://docs.google.com/document/d/1g1b3jLQcyrJFT9NyKj0L9rTL6L3DLhHr_lw8Da2UrNs/edit#

This is includes tools that have come up in node interviews as well as some others we are familiar with. CCDH folks, please add to this.

gaurav commented 3 years ago

We haven't added anything to this list for a while, and -- between Tools and Terminology -- we seem to be working or have worked on all the tools on that list except for Pentaho, Talend, REDCap, Karma and NCI Form Builder, none of which are immediately useful for CCDH. So should we close this issue, or is there a particular type of output that would be useful for this task?