ResearchObject / ro2019

Workshop on Research Objects 2019
http://www.researchobject.org/ro2019/
Apache License 2.0
3 stars 1 forks source link

RO-Crate, a lightweight approach to Research Object data packaging #3

Closed stain closed 5 years ago

stain commented 5 years ago

Authors

Name: Eoghan Ó Carragáin Affiliation: University College Cork ORCID: https://orcid.org/0000-0001-8131-2150

Name: Carole Goble Affiliation: The University of Manchester ORCID: https://orcid.org/0000-0001-9842-9718

Name: Peter Sefton Affiliation: University of Technology Sydney ORCID: https://orcid.org/0000-0002-3545-944X

Name: Stian Soiland-Reyes (intended speaker) Affiliation: The University of Manchester ORCID: https://orcid.org/0000-0001-9842-9718

Keywords

Preprint

https://doi.org/10.5281/zenodo.3337883

stain commented 5 years ago

Thank you for submitting to RO2019's open peer review process. We will shortly be assigning members from the Programme Committee to review.

Feel free to respond to reviewers comments and to update the submission if needed.

Tip: Anyone is welcome to add an informal review below using GitHub comments; as an author perhaps you would volunteer to review one of the other open submissions?

Reviewers, please copy this review form and add as a comment. You don't need to use this form if you are not assigned from the PC.

## Quality of Writing
_Is the text easy to follow? Are core concepts defined or referenced? 
Is it clear what is the author's contribution?_

(delete as appropriate)
* excellent / good / fair / poor

## Research Object / Zenodo

_URL for a Research Object or Zenodo record provided?
   Guidelines <http://researchobject.org/ro2019/submitting> followed?
   Open format (e.g. HTML)?
   Sufficient metadata, e.g. links to software?
   Some form of Data Package provided?
   Add text below if you need to clarify your score._

(delete as appropriate)
* none (e.g. only abstract in easychair/github)
* basic (e.g. Zenodo with PDF and minimal metadata)
* sufficient (e.g. HTML, detailed Zenodo metadata)
* good (followed guidelines, demonstrating own format, related resources included, but some details missing)
* excellent (e.g. followed all guidelines, complete metadata or RO-like research data package, linked data, provenance)

## Overall evaluation
_Please provide a brief review, including a justification for your scores. 
Both score and  review text are required._

(delete as appropriate)
* strong reject
* reject
* weak reject
* borderline 
* weak accept
* accept
* strong accept

For confidential remarks or questions about the peer-review process, contact ro2019@easychair.org

parfenov commented 5 years ago

I will review

stain commented 5 years ago

Reminder that review of this abstract is due tomorrow Friday 2019-07-26.

We're looking for one more volunteer! Perhaps @rapw3k, @josemanuelgp or @ocorcho could have capacity..?

stain commented 5 years ago

Review deadline is today - any other volunteers?

parfenov commented 5 years ago

(Disclosure: I've been collaborating with Eoghan Ó Carragáin on the concept of DaMaHub since 2017)

Data generated through publicly funded scientific studies must be preserved in persistent, discoverable, verifiable and re-usable formats in order to facilitate reproducible scientific research across disciplines, institutions and continents. In order to insure usability of data for a long time (and for anyone in the world), it has to be decided/agreed within the community of academic librarians, data archive and repository maintainers what kind of metadata must be captured and how it must be permanently linked to the valuable research data.

RO-Crate proposes to combine currently available openly licensed technologies, tools and linked-data formats that can be used independent of infrastructure to facilitate FAIR sharing of scientific datasets and employment of computational analytical methods.

The proposal is well-written and existing challenges of reaching consensus among stakeholders are well defined. However it's not clear how community efforts around this initiative to be expanded and sustained on the long timeline.

rapw3k commented 5 years ago

Quality of Writing

Research Object / Zenodo

(delete as appropriate)

Overall evaluation

This abstract introduces RO-Crate, a lightweight approach to Research Object (RO) data packaging. RO-crate started in response to discussions and feedback received regarding the existing research object packaging alternatives and recommendations. In fact, the RO model and its formalisation as a suite of ontologies do not formalize how ROs are saved or transmitted. Additionally, many domain experts, as well as a great number of developers are typically using similar principles for data packaging, e.g, JSON-LD manifest, schema.org annotation, BagIT, etc, and in some case they find the full specification of an RO and its manifest slightly complicated. RO-crate takes all these shared principles and applies them to the RO packaging, allowing to simplify the creation and maintenance of metadata ( both manually or programmatic), but also to create full ROs if needed. Furthermore, RO-crate consider the possibility of domain extensions when needed, but using schema.org for annotations whenever possible. RO-crate is quite relevant for the workshop, and even though it is still in an early phase, the workshop will be a great opportunity to discuss with other participants about their challenges when dealing with ROs, and ideas they could bring to the RO-crate specification, which is open for contributions. It will be in particular important to discuss about the challenges to come up with the “minimal” and “sufficient” set of metadata elements an RO-crate should include, how the profiling for different domains should be made or guided, generation guides for full ROs, possibilities to reuse existing tooling (both RO and non-RO aware), etc. Additionally, it would be good for authors to provide a clear distinction, if any, and possibilities when dealing with full ROs or RO-crates, and to provide some use cases for that.

dgarijo commented 5 years ago

Quality of Writing

Research Object / Zenodo

_URL for a Research Object or Zenodo record provided?

Overall evaluation

This abstract describes RO crate, a lightweight data packaging approach for Research objects.

The topic of the talk is highly relevant to the RO community, the workshop and potentially to eScience participants. I specially like the idea of relying on existing standards or commonly used vocabularies to define the packaging initiative, and look forward to discussing aspects like the metadata inference in a RO crate.

I am curious about other challenges that could be faced by a RO crate, but are not mentioned in the abstract. For example, many data products now include metadata properties that describe important parts of the provenance of the object. How to propagate these metadata to the ROcrate itself? What if the ROcrate infers metadata that are incosistent with these metadata? Another example would be the need of a resource to be physically present in a RO crate. Can the ROcrate be virtual, e.g., if the datasets are big? How to handle provenance and outside changes then?

Finally, I think that the need for Linked Data in this context needs a better motivation. A packaging approach by design aims to create a silo of self-contained information. Where is the need to connect it to outside resources if I just want to reproduce the experiment?

Minor comments: Codemeta is described as a data packaging initiative for software, but it's not. It's a metadata vocabulary for software.

I also don't know what a filename glob pattern is.

stain commented 5 years ago

Thanks for great reviews, @parfenov @rapw3k @dgarijo !

Agree that while the our governance is reflective of the current agile "bootstrap" phase, a more long-term governance plan for RO-Crate should be established and documented.

We have started some work on defining the minimal information needed, although it could be more systematic, perhaps by observing existing RO usage and related approaches for packaging, data and software metadata. I am starting research in this direction later in the year.

Indeed, inferred/extracted metadata can be more challenging in RO-Crate than in traditional RO model (which uses the concept of OA annotations), as all statements are now in the same file, and it is harder to separate out "replaceable" annotations that may have their own provenance. Our motivation in RO-Crate was that for most of the time this separation of annotations is of a secondary concern compared to having a more straight-forward way to consume and profile research objects.

Our use of JSON-LD mean we could permit @graph (named graphs), which could perhaps be a more convenient way to indicate potentially conflicting statements and their provenance, without requiring recursive RDF parsing.

We could try to cover these new aspects in an updated abstract before the workshop.

Thank you!

rapw3k commented 5 years ago

Dear @stain:

I am pleased to say that, given the reviewer comments, your submission has been accepted for oral presentation at RO2019.

You are invited to present a 10 minute talk, followed by joint Q&A at end of the session.

You are also welcome to bring along a poster for the informal RO2019 poster session (which may cover work not reflected in this submission).

Note: All RO2019 presenters must register and attend the workshop at eScience 2019 - please take care that you register with the Conference & Workshop option to join the RO2019 workshop on Tue 24 Sept 2019.

Registration fees applies. Note that the IEEE eScience early-bird registration discount expires this Monday - although we are negotiating with the eScience organizers to possibly extend this for presenters. A PhD and Early career grant is also available.

stain commented 4 years ago

Dear @stain (yes that is me)

We are looking forward to hearing your submission presented at the RO2019 workshop in San Diego next week!

You will find the slot for your talk RO-15 in the workshop schedule: http://researchobject.org/ro2019/schedule

Please let us know if you require a different time slot, or the indicated speaker has changed.

Tweets

Feel free to tweet before/during/after the workshop!

We will use the hashtags #ResearchObjects #eScience2019 (plural s)

Slides

If you are presenting slides, we would appreciate if you are able to upload them here: https://www.dropbox.com/request/B1EDF6bNOULm9wS5S7Z7

Format can be html, pdf, pptx, odp (we can convert to PDF)

I understand you may not have your slides ready yet, but we would appreciate if they are uploaded before your session start.

If your slides are web-based or already on Zenodo, submit their URL here: https://github.com/ResearchObject/ro2019/issues/new/choose

The resolution of the projector is 1920x1080 (16:9 widescreen).

Tip: If you use Twitter, include @YourUsername in the footer!

After the workshop we will upload the slides to Zenodo as CC-BY, please let us know if you require your slides to not be made public.

Zenodo

You may also want to update your Zenodo record to augment your paper/abstract with the slides as a PDF - the DOI links from the Schedule page always to go the latest Zenodo update.

We have noticed some uploads have a mismatch of the author lists. This might be a good time to check the metadata: http://researchobject.org/ro2019/proceedings

Posters

As noted earlier, all accepted speakers are welcome to bring along posters for the "informal" poster session, even if the poster is not directly related to the accepted talk.

The poster should however be relevant to the aims of the RO2019 workshop and not be a pure commercial.

If you bring a poster, please use the 10:05-10:30 coffee break to hang it in the room.

Posters should be no larger than 35” (0.89m) wide by 48” (1.2 m) tall with portrait-style orientation.

Poster presenters are welcomed to do a 2 minute lightning talk during the "unconference" session.

Unconference and collaborative notes

We are welcoming suggestions for activities during the afternoon unconference session, e.g.: lightning talks, posters, demos, breakouts, discussions

Please help add ideas, comments and suggestion in the Collective notes for RO2019 workshop https://s.apache.org/ro2019

You may also use this document for adding URLs and questions during the presentation sessions.