davidpng / FCS_Database

Program to scrape an FCS directory of metadata
GNU General Public License v3.0
3 stars 2 forks source link

Build feature extraction [binning] and feature data I/O [HDF5] #28

Closed hermands closed 9 years ago

hermands commented 9 years ago

Proposal is to:

davidpng commented 9 years ago

HDF5 does not appear to support databases all that well. We can try packing the individual tables into the container but it will need to be reconstituted through a SQL declaration. Please see: http://www.pytables.org/moin/FAQ#IsPyTablesareplacementforarelationaldatabase.3F and this concerning the h5py I am using: http://docs.h5py.org/en/2.3/faq.html#what-s-the-difference-between-h5py-and-pytables

davidpng commented 9 years ago

In response to your proposal: FCS subroutines

  1. Add FCS_subroutines/[FCS_feature_extraction].py that bins processed data and stores in FCS attribute _This has been implemented via calling FCS.feature_extraction() which generates a new class called ND_Feature_Extraction. The sparse array holding this information can be accessed by ND_FeatureExtraction.histogram. Addition functionality concerning decoding the index to a specific bin has been built in. Would you prefer this 'histogram' to be a variable within FCS instead?
hermands commented 9 years ago

Sounds good. Histogram can be not in FCS if it is always going to be pushed to HDF5 and it pushes itself. It sounds like we now have two histogram functions (1D and multi-D), but that is probably fine as long as that's all we need.

davidpng commented 9 years ago

HDF5 hierarchy /database_info_version /query /case_list /data/"case_tube_idx"/data_array /data/"case_tube_idx"/indices /data/"case_tube_idx"/indptr /data/"case_tube_idx"/shape

davidpng commented 9 years ago

HDF5_IO renamed to Feature_IO, new class MergedData_IO inherit similar things such as push and pull dataframes. This now handles IO to two separate HDF5 files with different schemas and properties.