MongoDB models - Githubissues

NicolasKieffer commented 2 years ago

This issue will contain all the information about the data to be stored in mongoDB.

Feel free to add any documentation to enrich the models (structure of the data we store) here.

NicolasKieffer commented 2 years ago

Datasets structure

Currently, a "dataset", a "code", a "lab material" or a "protocol" is stored as a "dataset" in our database. They then "share" the same properties.

I will add all properties from new GUI forms, just tell me if any properties are missing (there is no limit, so we can add an "RRID" property instead of using the existing "DOI" property)

Perhaps we should create a specific structure for each of them (I will at least integrate all possible properties into the existing model).

List of avaiable current properties of "data objects" (datasets, codes, materials or protocols): here

property name	type	description
id	String	id property of the data object
dataInstanceId	String	dataInstance id of the data object (id used in the TEI file)
sentences	List of Sentence*	list of sentences
reuse	Boolean	"reuse" property of the data object
qc	Boolean	"quality control" property
representativeImage	Boolean	"representative image" property
issue	Boolean	"issue" property
notification	String	notification property (store the notification text content)
highlight	Boolean	"highlight" property
cert	String	cert value (between 0 and 1)
dataType	String	datatype of the data object
subType	String	subtype of the data object
description	String	description of the datatype/subtype
bestDataFormatForSharing	String	best data format for sharing of the datatype/subtype
bestPracticeForIndicatingReUseOfExistingData	String	best practice for indicating re-use of existing data of the datatype/subtype
mostSuitableRepositories	String	most suitable repositories of the datatype/subtype
DOI	String	DOI of the data object
name	String	name of the data object
comments	String	comments of the data object
status	String	status of the data object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it)

Note: a sentence just have two properties: an id and the text

Here are the new forms for "data objects":

Datasets from

Codes form

Materials form

Protocols form

Souad314 commented 2 years ago

This excel sheet also describes the inputs in detail on the last tab (Tab 7 - the other tabs are notes on the current html screens)

All Objects

Suggested Documentation for each object contains repeated fields (Name, etc), [associated excel sheet has coloured cells indicating changes to make it easier]

Properties are in order of appearance in UI - each property after the "issue" property were left in the table under the assumption they are required in backend storage or other processes

removed some properties which have no space (highlight, best practice, etc)

Note that alignment is slightly off, everything is kind of blurry - (apologies) - all added fields should have the same fonts/colours/alignments as existing screens

Found icon images (stolen from quick Google search) to go with new fields in UI:

-Key Icon

-Catalog Icon

-Microscope Icon

-Citation Icon

Datasets

Suggested Documentation

Code and Software

Code: New Custom Script:

Re-Use Custom Script:

Software:

Suggested Documentation

Lab Materials

New Material

Material Re-Use

Suggested Documentation

Protocols

New Protocol Re-Use - Another Protocol (is the "Private" source)

Re-Use - Manufacturer's Instructions (is the "Public" source)

Suggested Documentation

Souad314 commented 2 years ago

@NicolasKieffer Here are some proposed changes to Protocols: During the 11-Mar Tech Meeting (of epic length), we talked about adding Protocol as the Datatype for the ability to have Published Protocol and Manufacturer Instructions as Subtypes. So, after much consideration, here are some notes for implementation:

1. Wiki

I have added these pathways to the Wiki Protocol Protocol: Published Protocol Protocol: Manufacturer Instructions

Question: Datatypes are ready to be refreshed, does this affect already curated documents - do their tokens need to be refreshed, and does that affect OCR docs?

2. Report

Changed this report (with all possible pathways - 2nd Tab, Protocol Section only) ASAP Template 18 Feb to include the output pathways associated with this change
- Although re-use is still an input on the report tab (at the moment), the functionality is erased when Protocol subtypes exist: To further explain,
When no subtype selected (Subtype=none), Protocol is "New"
If any other subtype is selected (Subtype=Published Protocol; Subtype=Manufacturer Instructions), this Protocol is considered re-use
If Manufacturer Instructions selected, Dataset is considered Done (Shared/Cited)
Summary Tab numbers to reflect new inputs
DS UI version of article not yet up to date

3. Burger Screens

Changes made to Protocol Screens
Changes to suggested Documentation: (formatted for better visualization in doc)
No longer need "ProtocolSource" input
Re-Use not available for this datatype

New Protocol (Datatype=Protocol, Subtype=None)

Published Protocol (Datatype=Protocol, Subtype=Published Protocol)

Manufacturer Instructions (Datatype=Protocol, Subtype=Manufacturer Instructions)

NicolasKieffer commented 2 years ago

Datatypes are ready to be refreshed, does this affect already curated documents - do their tokens need to be refreshed, and does that affect OCR docs?

This will not affect the document, only the "select" elements (selectable lists of data types/subtypes). Datasets using "old" datatype/subtype will still have the old datatype/subtype values

DataSeer / dataseer-web

MongoDB models #615

Datasets structure

All Objects

Datasets

Code and Software

Lab Materials

Protocols

1. Wiki

2. Report

3. Burger Screens