caseykneale / ChemometricsData.jl

Chemometrics Data Respository, Scraper, and Fetcher.
MIT License
9 stars 1 forks source link

ChemometricsData.jl

Stable Dev Build Status Coverage

Overview

The purpose of this package is to allow a user to have easy access to a consortium of publicly available chemometrics datasets.

Chemometrics data is widely dispersed, in a variety of file formats, and arbitrary conventions. For people wishing simply to load in some data, learn the field, and try new techniques, run a study, or even do a metanalysis this can complicate matters. To ameliorate this, this package attempts to bundle everything into an intuitive and Julia ready format that is ready for investigation or personal exploration.

Data that is obtained or made accessible via this API is either "curated" or "scraped" from public domain resources. As a user it is your duty to investigate the use, utility, and integrity of this data for your purposes or end goals. This package does it's best to inform you of these nuances, but may have flaws in the information available, and even then - sometimes that information changes over-time. See the Liability section for more information.

Example Usage

Inspection

The code base provides some pretty basic commands to let users kind of poke around for data. Maybe you don't know what dataset you want, but you're interested in trying your hand at some chemometrics tasks using mid-infrared spectroscopy. You could do the following: terminal

Loading

Of course you can easily crack open the datasets and have them ready to use in a few short lines of code. Below is an example workflow of loading in some data, using some metadata, and generating a plot:

quadrum_data = load("Fresh_Meats")
meta_data = meta("Fresh_Meats")

spectra = numeric_columns( gimme_the_data )
bins = parse.( Float64, names(spectra) )
bin_display_range = 1:100:length(bins)
using Plots
plot(   spectra |> Matrix |> transpose,
        legend = false,
        title = "Fresh Meats \n (" * meta_data["URL"] * ")",
        xlabel = "Wavenumber", ylabel = "Absorbance",
        xticks = ( bin_display_range, bins[bin_display_range] )
)

meat data

Attribution

When a dataset is loaded using this package a reminder is displayed for the user to honor it's provenance in derivative works and publications, attribution Authors/owners of datasets can also request custom messages to be displayed. For example, the penicillin dataset will emit the following message on loading: "Please acknowledge the following paper if utilizing the spectral data which can be freely downloaded at www.industrialpenicillinsimulation.com. Goldrick S., Duran-Villalobos C., K. Jankauskas, Lovett D., Farid S. S, Lennox B., (2019) Modern day control challenges for industrial-scale fermentation processes. Computers and Chemical Engineering.".

Submissions

How to Submit

There are several ways to contribute to this effort:

Restrictions for submissions

Liability

The creators and contributors to this package are not responsible for the outcomes of the use of any data, or code in this repository. This is to be used at ones own risk. Changes to the end users file system may occur with remotely accessed datasets(obtained with the fetchdata() command). This is because this package will unpack them(after checking their MD5 checksum for authenticity). The offline available data should present minimal risks.

Before redistributing any data accessed via this package (either stored in the git repository or via the packages use) ensure that the permissions you have obtained for this data allow for this. The contributors to this package can only offer permissions for data they themselves own. Just because something is "public domain" does not imply it may be used commercially, or redistributed without author's approval.

Should also be noted that it is possible errors exist in these datasets due to them being manipulated(transposed, file formats changed, etc) to get them into a common form. Please report any and all issues - it's greatly appreciated.

Roadmap