Tile_DB & Neo4j? - Githubissues

olszewskip commented 4 years ago

Hi! Sorry if this is not the right place to ask this, or if my question is not specific enough: I'm looking around for a solution to work with genomic data, primarily VCFs, and Tile_DB struck me as a really cool solution, due to its emphasis on being able to exploit sparseness and modelling data as multidimensional matrices. But then I would also like to somehow integrate the data encoded by a VCF with inherently graph- or tree-like data, e.g. biological ontologies or protein-interaction data. An example of the latter is https://het.io/about/, which is a Neo4j database. There are even apparently applications where it makes sense to import the whole VCF to Neo4j as well: https://github.com/phenopolis/pheno4j. More generally, I think, it makes sense to equate a matrix that is sparse in its first two dimensions and a graph (with the sparse matrix being the adjacency matrix of the graph). My questions are:

Is there some (present or planned for the future) communication machanism that would allow me to map to or from a graphical framework to tile_db sparse matrix? Could I e.g. export/expose a VCF imported to Tile_DB as a neo4j graph, or vice-verse to export/expose a neo4j graph to Tile_DB as an array?
Conversely, would it make sense to work with graphical data directly in Tile_DB to implement and use, say, the page-rank algorithm?

stavrospapadopoulos commented 4 years ago

Hi @olszewskip, thanks for reaching out!

Regarding VCF data, have you checked https://github.com/TileDB-Inc/TileDB-VCF? We are quite actively developing it.

Regarding representing graph data as sparse adjacency matrices, this is of great interest to me (and the original motivation behind TileDB). Both exporting a TileDB adjacency matrix to neo4j, as well as starting to implement graph algorithms (via sparse Linear Algebra) are very interesting. We have been having such discussions internally for some time, we'll add some initial implementations in our roadmap. Of course we always welcome contributions and are open to feature design discussions. Also you could add feature requests here: https://feedback.tiledb.com/

Thanks!

olszewskip commented 4 years ago

Awesome! Thank You for the references (I've read some documentation before. The discussion in https://docs.tiledb.com/genomics/storing-variants-as-arrays is particularly nice), and for the sanity check. Glad to know, the tile_DB team is having this direction in mind. Being able to integrate heterogeneous sparse data and programmatically construct and run efficient queries on it from python (or julia?) would be just mindblowing :)

TileDB-Inc / TileDB

Tile_DB & Neo4j? #1820