DB plan and execute - Githubissues

Gondee commented 8 years ago

As we talked about the DB. There will be two components.

The first is a file that keeps track of the labels and accessory information needed to store training data and models.
The second is the actual csv file that is returned by the sensor. Not sure if we want to convert this to something thats more easily readable by everything else (to avoid doing it every load).

Need to decide the columns in the accessory file

Ian, how are the labels best stored. Does each scan need a array of separate labels identifying the concentrations, then an additional label identifying the whole compound (for training data)? -(does every scan need to compute through the global dictionary of individual material/chems?)

Havnt slept in a while, so forgive me if im not making sense ahah

IannothSlurgh commented 8 years ago

I already have some DB code up in the chemometrics link using ngCordova plugin. The plugin is not perfect, but it save a lot of writing (it lacks a file append, which is sad).

Description of development: -Three file systems, one each for: PLS model PCA model Scan data

-The file systems have a management file with names of files currently stored for the particular file system. This allows GUI to ask the appropriate file system for names of files that can be loaded.

-All non-management files are JSON, this means that cordova can just read the entire file and use angular.fromJson to convert it to an object.

-In the case of scan data, I am storing a JSON object with absorptions, concentrations, and name of each chemical in the sample. What I figured is that we'd take CSVs read from scanner and parse it, removing the relevant information, place that data into an object, angular.toJson it, and then write it to a file.

IannothSlurgh commented 8 years ago

To answer your questions: -All scan files at the end of processing will need concentrations, names for those concentrations, and absorbances. Where they differ is if they were generated in training mode or in inference mode. If made in training mode, we must rely on user to input concentrations and names for those concentrations. If made in inference mode, we rely on chemometrics engine to generate those values.

-We don't need to add names for the full sample, adding them is for thoroughness and reflects better for the Pca. If added, these would need to be input by the user from GUI and then would represent an additional variable in JSON objects stored in files.

-Compute through global dictionary of individual materials? Global dictionary is based off of the individual files you chose for the model. All other files containing materials not in training set don't exist for our purposes.

In training mode, we do a lot of file work- we need to read existing (or just newly made) scan files which contain training data, but as we go, we store that data in local memory. In inference mode, we already have what chemicals exist in local memory, and use the chemometrics engine to determine which of these chemicals is present in a new sample. We then store only the chemicals that do exist in the newly created scan file. Ho

Gondee commented 8 years ago

Ian can you push what you have so far for the DB?

IannothSlurgh commented 8 years ago

To github? If you want to just get at what is written, it is here:

https://drive.google.com/a/tamu.edu/folderview?id=0B_2fUAHLcAnGT2JyQkVpbmRzMFE&usp=sharing

I don't want to ruin the repository state.

IannothSlurgh commented 8 years ago

Every time a sample is scanned while train is toggled on, chemotrain is called, but it needs not only the data from the most recent scan, but all other data generated by other training scans.

The result? Somewhere outside the chemometrics engine, the data should be stored and appended to as necessary. A file is not necessary, since this data does not need to survive past each session of the app.

When chemoinfer is called, we don't need to use files either. At this point in the project, minimalism is the name of the game.

However- there is one database feature we DO need. The ability to write PLS and PCA models to files and load them when we use appropriate GUI stuffs. PLS and PCA models are just simple objects with data in them. So what is needed is to write and read models synchronously to the database.

So what I really need is actually largely not DB functionality. What I need is for someone outside of the chemometrics section to keep track of all that data read so far for training data and to append to it as necessary and keep passing it to chemotrain after a new training scan has been added.

chemotrain wants: [[concentrations for sample 1], [concentrations for sample 2], etc], [[absorptions for sample 1], [absorptions for sample 2], etc], [names of each kind of chemical concentration]

Gondee / pMIR

DB plan and execute #18