DataSeer / dataseer-web

DataSeer web application
GNU General Public License v3.0
13 stars 1 forks source link

MongoDB models #615

Open NicolasKieffer opened 2 years ago

NicolasKieffer commented 2 years ago

This issue will contain all the information about the data to be stored in mongoDB.

Feel free to add any documentation to enrich the models (structure of the data we store) here.

NicolasKieffer commented 2 years ago

Datasets structure

Currently, a "dataset", a "code", a "lab material" or a "protocol" is stored as a "dataset" in our database. They then "share" the same properties.

I will add all properties from new GUI forms, just tell me if any properties are missing (there is no limit, so we can add an "RRID" property instead of using the existing "DOI" property)

Perhaps we should create a specific structure for each of them (I will at least integrate all possible properties into the existing model).

List of avaiable current properties of "data objects" (datasets, codes, materials or protocols): here

property name type description
id String id property of the data object
dataInstanceId String dataInstance id of the data object (id used in the TEI file)
sentences List of Sentence* list of sentences
reuse Boolean "reuse" property of the data object
qc Boolean "quality control" property
representativeImage Boolean "representative image" property
issue Boolean "issue" property
notification String notification property (store the notification text content)
highlight Boolean "highlight" property
cert String cert value (between 0 and 1)
dataType String datatype of the data object
subType String subtype of the data object
description String description of the datatype/subtype
bestDataFormatForSharing String best data format for sharing of the datatype/subtype
bestPracticeForIndicatingReUseOfExistingData String best practice for indicating re-use of existing data of the datatype/subtype
mostSuitableRepositories String most suitable repositories of the datatype/subtype
DOI String DOI of the data object
name String name of the data object
comments String comments of the data object
status String status of the data object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it)

Note: a sentence just have two properties: an id and the text

Here are the new forms for "data objects":

Datasets from image

Codes form image

Materials form image

Protocols form image

Souad314 commented 2 years ago

This excel sheet also describes the inputs in detail on the last tab (Tab 7 - the other tabs are notes on the current html screens)

All Objects

  • Suggested Documentation for each object contains repeated fields (Name, etc), [associated excel sheet has coloured cells indicating changes to make it easier]
  • Properties are in order of appearance in UI - each property after the "issue" property were left in the table under the assumption they are required in backend storage or other processes
  • removed some properties which have no space (highlight, best practice, etc)
  • Note that alignment is slightly off, everything is kind of blurry - (apologies) - all added fields should have the same fonts/colours/alignments as existing screens
  • Found icon images (stolen from quick Google search) to go with new fields in UI:
  • -Key Icon
  • -Catalog Icon
  • -Microscope Icon
  • -Citation Icon

Datasets

image

Suggested Documentation

property name | type | description | Associated text in UI | Notes -- | -- | -- | -- | -- name | String | name of the data object | "Dataset-x" |   dataType | String | datatype of the data object | Datatype |   subType | String | subtype of the data object | Subtype |   reuse | Boolean | reuse property of the data object | This dataset is re-used from another public or private source |   representativeImage | Boolean | representative image property | This dataset is a Representative Image (or another type of media) |   qc | Boolean | quality control property | This dataset was created for Quality Control (QC) or confirmatory purposes |   DOI | String | DOI of the data object | Stable URL, DOI, or other link to this object |   PID | String | Accession Number or PID of the data object (not an URL), can be "not applicable" | Accession Number/Permanent Identifier (PID) |   comments | String | comments of the data object |   |   issue | Boolean | issue property | There is an issue with the information provided in the manuscript text |   id | String | id property of the data object |   |   dataInstanceId | String | dataInstance id of the data object (id used in the TEI file) |   |   sentences | List of Sentence* | list of sentences |   |   status | String | status of the data object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it) |   |   notification | String | notification property (store the notification text content) |   |   cert | String | cert value (between 0 and 1) |   |     |   |   |   | *Input fields for datatypes do not change depending on other inputs, i.e. Re-Use datasets have the same language and requirements as New datasets in the UI, Rep images can be re-use, QC, and any other combination

Code and Software

Code: New Custom Script: image

Re-Use Custom Script: image

Software: image

Suggested Documentation

property name | type | description | Associated text in UI | Notes -- | -- | -- | -- | -- name | String | name of the code object |   |   subType | String | subtype of the code object | Subtype |   reuse | Boolean | reuse property of the code object | This item is re-used from a public or private source |   version | String | the version number for the associated software or package | Version |   DOI | String | DOI of the code object | Stable URL, DOI, or other link to this object |   RRID | String | Research Resource Identifier (RRID) associated with software and lab materials, not always available, not an URL | Research Resource Identifier (RRID) |   comments | String | comments of the code object |   |   issue | Boolean | issue property | There is an issue with the information provided in the manuscript text |   id | String | id property of the code object |   |   dataInstanceId | String | dataInstance id of the code object (id used in the TEI file) |   |   sentences | List of Sentence* | list of sentences |   |   status | String | status of the code object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it) |   |   notification | String | notification property (store the notification text content) |   |   cert | String | cert value (between 0 and 1) |   |   DataType | [always Code and Software on this part of screen] | not a choice for users, when on Code screen, all new added datasets are DataType=Code and Software |   | if Subtype=custom script and Re-use = No, then different inputs displayed, if Subtype=custom script and Re-Use = Yes, then Version removed

Lab Materials

New Material image

Material Re-Use image

Suggested Documentation

property name | type | description | Associated text in UI | Notes -- | -- | -- | -- | -- name | String | name of the material |   |   subType | String | subtype of the material | Subtype |   reuse | Boolean | reuse property of the material | This item is re-used from a public or private source |   labSource | String | name for source of the material (if re-use) | Source (Name of Lab or Commercial Source) |   Catalog | String | Catalog Number associated with material (asks for Lab Name if Source is Private) | Catalog Number |   RRID | String | Research Resource Identifier (RRID) associated with software and lab materials, not always available, not an URL | Research Resource Identifier (RRID) |   comments | String | comments of the data object |   |   issue | Boolean | issue property | There is an issue with the information provided in the manuscript text |   id | String | id property of the code object |   |   dataInstanceId | String | dataInstance id of the code object (id used in the TEI file) |   |   sentences | List of Sentence* | list of sentences |   |   status | String | status of the code object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it) |   |   notification | String | notification property (store the notification text content) |   |   cert | String | cert value (between 0 and 1) |   |   DataType | [always Lab Materials on this part of the screen] | not a choice for users, when on Materials screen, all new added datasets are DataType=Lab Materials |   | if re-use is no, there are less inputs, if yes then two source inputs (2nd screenshot)

Protocols

New Protocol image Re-Use - Another Protocol (is the "Private" source) image

Re-Use - Manufacturer's Instructions (is the "Public" source) image

Suggested Documentation

property name | type | description | Associated text in UI | Notes -- | -- | -- | -- | -- name | String | name of the protocol object |   |   reuse | Boolean | reuse property of the protocol object | This item is re-used from a public or private source |   protocolSource | Boolean | choice for if yes object is a re-use protocol (methods section) is Manufacturer's Instructions or another protocol | Protocol Type [Choices: 1-"Another Protocol"; 2-"Manufacturer's Instructions"] | See Screenshot 2 and 3 DOI | String | DOI of the data object | Stable URL, DOI, or other link to this object |   comments | String | comments of the data object |   |   issue | Boolean | issue property | There is an issue with the information provided in the manuscript text |   id | String | id property of the code object |   |   dataInstanceId | String | dataInstance id of the code object (id used in the TEI file) |   |   sentences | List of Sentence* | list of sentences |   |   status | String | status of the code object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it) |   |   notification | String | notification property (store the notification text content) |   |   cert | String | cert value (between 0 and 1) |   |   DataType | [always Other on this part of the screen] | not a choice for users, when on Protocols screen, all new added datasets are DataType=Other -Note: if re-use is yes and type is Manufacturer's Instructions, no further inputs required for object (2nd screenshot) |   |   subType | [always Protocol on this part of the screen] | not a choice for users, when on Protocols screen, all new added datasets are subType=Protocol |   |  
Souad314 commented 2 years ago

@NicolasKieffer Here are some proposed changes to Protocols: During the 11-Mar Tech Meeting (of epic length), we talked about adding Protocol as the Datatype for the ability to have Published Protocol and Manufacturer Instructions as Subtypes. So, after much consideration, here are some notes for implementation:

1. Wiki

Question: Datatypes are ready to be refreshed, does this affect already curated documents - do their tokens need to be refreshed, and does that affect OCR docs?

2. Report

3. Burger Screens

property name | type | description | Associated text in UI | Notes -- | -- | -- | -- | -- name | String | name of the protocol object |   |   subType | String | "Protocol Type" in Desired ScreensSubtype = None: - New Protocol (Desired Screen 1)Subtype = Published Protocol (Desired Screen 2)Subtype = Manufacturer Instructions (Desired Screen 3) |   |   reuse | Boolean | reuse property of the protocol object | This item is re-used from a public or private source | No Re-Use available because Subtypes (Published Protocol and Manufacturer Instructions imply re-use already, so this field should not be changeable under these subtype conditions) protocolSource | Boolean | choice for if yes object is a re-use protocol (methods section) is Manufacturer's Instructions or another protocol | Protocol Type [Choices: 1-"Another Protocol"; 2-"Manufacturer's Instructions"] |   DOI | String | DOI of the data object | Stable URL, DOI, or other link to this object |   comments | String | comments of the data object |   |   issue | Boolean | issue property | There is an issue with the information provided in the manuscript text |   id | String | id property of the code object |   |   dataInstanceId | String | dataInstance id of the code object (id used in the TEI file) |   |   sentences | List of Sentence* | list of sentences |   |   status | String | status of the code object: saved (not valid but saved) OR valid (valid) OR modified (a user modifies it) |   |   notification | String | notification property (store the notification text content) |   |   cert | String | cert value (between 0 and 1) |   |   DataType | [always Protocol on this part of the screen] | not a choice for users, when on Protocols screen, all new added datasets are DataType=Protocol (or DataType Other) |   |  

New Protocol (Datatype=Protocol, Subtype=None) image

Published Protocol (Datatype=Protocol, Subtype=Published Protocol) image

Manufacturer Instructions (Datatype=Protocol, Subtype=Manufacturer Instructions) image

NicolasKieffer commented 2 years ago

Datatypes are ready to be refreshed, does this affect already curated documents - do their tokens need to be refreshed, and does that affect OCR docs?

This will not affect the document, only the "select" elements (selectable lists of data types/subtypes). Datasets using "old" datatype/subtype will still have the old datatype/subtype values