This repository corresponds to the DataSeer web application, which aims at driving the authors of scientific article/manuscripts to the best research data sharing practices, i.e. to ensure that the datasets coming with an article are associated with data availability statement, permanent identifiers and in general requirements regarding Open Science and reproducibility.
Machine learning techniques are used to extract and structure the information of the scientific article, to identify contexts introducting datasets and finally to classify these context into predicted data types and subtypes. These ML predictions are used by the web application to help the authors to described in an efficient and assisted manner the datasets used in the article and how these data are shared with the scientific community.
See the dataseer-ml repository for the machine learning services used by DataSeer web.
Supported article formats are PDF, docx, TEI, JATS/NLM, ScholarOne, and a large variety of additional publisher native XML formats: BMJ, Elsevier staging format, OUP, PNAS, RSC, Sage, Wiley, etc (see Pub2TEI for the list of native publisher XML format covered).
Main authors and contact: Nicolas Kieffer, Patrice Lopez (patrice.lopez@science-miner.com).
The development of dataseer-ml is supported by a Sloan Foundation grant, see here.
dataseer-Web is distributed under Apache2 license.
This appliaction is composed of :
Documents, Organizations and Accounts data are stored in MongoDB. Files (PDF, XML and TEI) uploaded on dataseer-web are stored in the server FileSystem
npm i
// NodeJS V16.0
npm run // Display list of available options
npm start // Start headless process with forever (production)
npm start-dev // Start process (development)
npm stop // Stop headless process
Application requires:
27017
with an app
database)You must create some configurations files (based on *.default
files) and fill them with your data :
conf/conf.json
: global app configurationconf/crisp.json
: crisp configurationconf/recaptcha.json
: recaptcha configurationconf/smtp.json
: smtp configurationconf/userflow.json
: userflow configurationconf/services/dataseer-ml.json
: dataseer-ml configurationconf/services/dataseer-wiki.json
: dataseer-wiki configurationconf/services/repoRecommender.json
: repoRecommender configurationconf/services/softcite.json
: softcite configurationThis application require a private key to create JSON Web Token You must create file conf/private.key and fill it with a random string (a long random string is strongly recommended)
All the files concerning the mails are in the conf/mails
directory.
Your role defines which data you can access.