OmicsDI / gpmdb-ddi-reader

GPMDB reader and parser for DDI Project
1 stars 0 forks source link

DDI-GPMDB-Reader

The README is used to introduce the module and provide instructions on how to install the module, any machine dependencies it may have (for example C compilers and installed libraries) and any other information that should be provided before the module is installed.

INSTALLATION

This module does not need to be installed system wide, the only requesite is to install the dependency modules found in the requirements file. You can install the modules by using a local::lib or the system lib folder.

DEPENDENCIES

Perl Modules

The programs depends on a few modules that can be easily installed using cpan. Also, the program depends on a local MongoDB server running.

MongoDB

A Mongodb server is necessary as the models will be stored at a local database and collection.

Local GPMDB folder structure

It is necessary that you have in place the same folder achitecture found in GPMDB (003, 066, ...). Inside each folder the program will store the models. It is strongly advided that you already have a almost-up-to-date version of the GPMDB models, as the program will not downlaod them in parallalel mode. Getting the models, checking for sanity and saving them is a "slow" process not suitable for large numebrs of files. Consider this as a synchronizer program, not a mass downloader.

PARAMETERS

The run.pl scripts has a set of parameters to be set before running the program.

HOW TO USE

Before running the program, check if you have all the dependencies listed above, including the mongodb server running.

Open the run.pl script and set the parameters for the program to run, including folder and file locations. Also you need to set the name of the mongodb and collection.

Start the pipeline by executing the run.pl script. The program will start checking your current list of models. Having all models listed, the program checks the ftp server for the current file listing, any new file that is not listed on the ignore file will be downloded, checked and stored in gz format.

After all files are downlaoded the program will transform each model on a mongodb record, (only a selected number of fields are processed), and finally all records are grouped and uploaded to the database.

KNOWN BUGS

Some times, during the data processing for loading in MongoDB, the program emmits a Warning message indicating a float casting error. This is only a warning message that indicates failure during numeric scope enforcing. This does not cause the program to stop nor alters the variable content. Variable scoping is made automatically by the interpreter and some times it is necessary to enforce specific values to certain scopes.

LICENSE AND COPYRIGHT

Copyright (C) 2015 Felipe da Veiga Leprevost

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See Lhttp://dev.perl.org/licenses/ for more information.