Open jairideout opened 10 years ago
Related to #1322.
ObservationMap
item. Designing the in-memory and disk data structures at the same time can reduce performance issues.As I pointed out in previous discussions, I think the ObservationMap should be a new standard format, like biom. This way, new clusterers can adopt it as a default output, reducing our overhead to include them on QIIME. Furthermore, we can also provide C/C++ and Python parsers making its usage easier for the community. For example, we can design it in order to support parallel access, most importantly on disk, but also in memory.
What others think? I'd like to work on this and start meeting with some people in order to design it.
Also related to #1163
We need classes for many of the core types of data dealt with in QIIME. A lot of these classes may end up in scikit-bio or biom-format, but it'll be useful to get a list started here.
Once implemented, these new classes should first be used in the core QIIME scripts that are becoming pyqi-ized (see #1327) for the 1.9.0 release.
This issue takes precedence over #1327.
Initial list:
DistanceMatrix
(in bipy, see https://github.com/biocore/bipy/pull/42)MetadataMap
(in biom-format andqiime.util
; needs overhaul, possible to be based on pandas dataframe. Final home may be biom-format?)ObservationMap
(e.g., OTU map): not implemented AFAIKCoordinateMatrix
? (e.g., PCoA coordinates file; name is probably off)