CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Collaboration between CLARIAH Tool discovery and related initiatives #128

Open proycon opened 2 years ago

proycon commented 2 years ago

In this issue I'd like to track potential and ongoing collaborations related to our CLARIAH Tool Discovery track.

Codemeta Community

First of all I'm seeking collaboration with the wider codemeta community and have raised various issues on their issue tracker to express certain software metadata aspects that could hitherto not be properly expressed in the existing codemeta/schema.org vocabulary, mostly related to software interface types and data input/output types. This has led to an extension on top of codemeta, in collaboration with @dgarijo: https://github.com/SoftwareUnderstanding/software_types. Codemeta and schema.org are fairly slow moving targets, so an independent extension was the quickest way to get results. It could eventually be merged into codemeta and/or schema.org if proven interesting/popular enough.

Research Software Directory (eScience)

I met with @maaikedj and @jmaassen from the eScience center last week. We discussed possible collaboration between their Research Software Directory (source, instance) and our work. Where we focus fully on automatic data harvesting and a unified data representation, they have built a fully-fledged CMS where users can express their software metadata. They do use some smart automatic harvesting to help the content editor auto-fill various properties and connect with various standards (including codemeta, citation.cff). They are currently working on a new version of the RSD (due in september) which they will also offer "as a service" (for institutions that can't or don't want to self-host).

We agreed that collaboration between our two projects could be beneficial for both parties. The focus here would be finding common ground on the data representation and the vocabularies used (LoD, codemeta, schema.org). I invited them to pitch in in any of the vocabulary discussions we have (e..g #32). If we have sufficient agreement on data representation and vocabulary we can envision an import/export function in the Research Software Directory which enables connectivity with our work:

  1. their import could allow them to hook up to our harvesting pipeline
  2. their export could give our developers a means to interactively compose a codemeta.json file in their CMS (for inclusion in the source code), as opposed to having to edit JSON-LD by hand as is the case now.

ODISSEI also expressed an interest in the Research Software Directory. As ODISSEI and CLARIAH are merging in the proposed continuation project in the future, this may be an extra incentive for cooperation.

KNAW DANS

In April, I spoke with Wim Hugo (DANS KNAW) who also expressed an need/interest in expressing tool metadata and data types. They are looking into possibilities for an operational tools and type registry that will be compatible with work done to date by CLARIAH. Part of this we might accommodate with our software types extension, but their requirements probably go further than this.

DANS also works on Dataverse (@4tikhonov), where they are also working on supporting codemeta.

CLARIN Switchboard

I already worked with the CLARIN Switchboard in the past and have it on my radar to realize a degree of convertability of our software metadata descriptions to the format their tool registry requires. This is tracked separately in issue #36.

@antalvdb suggested to arrange a meeting with @dietervu to discuss further collaboration here.

CLARIN VLO

Inclusion of CLARIAH software in the CLARIN VLO would be facilitated by export to CMDI (@JanOdijk). This is tracked separately in issue #37.

Ineo

Progress to connect with Ineo (CLARIAH-internal) is tracked in issue #35 (@Seb-CLARIAH)

Conclusion

In all these collaborations I would like to emphasise the most important goal is to come to agreements on common data representations and vocabularies, facilitating exchange between different tools (and providing the necessary implementations).

Let's track any progress on these and other collaborations in this issue. Feel free to comment on anything or suggest additional possibilities!

proycon commented 1 year ago

Another initiative to keep our eye on is science-on-schema.org, I'll see to what extend we already comply to their recommendations and perhaps make some adjustments where useful.