MatthewRalston / kmerdb

Python bioinformatics CLI for k-mer counts and de Bruijn graphs
https://matthewralston.github.io/kmerdb
Apache License 2.0
12 stars 1 forks source link

Interface updates planned, logging rework, and bugfixes. ouch. #126

Closed MatthewRalston closed 8 months ago

MatthewRalston commented 8 months ago

Minor updates to graphpy_refactor_towards_fileutil.

The neighbor structure is still not working yet. At one point I knew what to call this

edges , adjacencies, neighbors.

adjacency duple (2-tuple) (node1 , node 2)

[subject <- > neighbor ] ( ... sort all nodes by subject only)

raah

MatthewRalston commented 8 months ago

still hung up on the neighbor structure and edge list data structure needs.

( (n1, n2,) weight, adjacency:bool=True) 3-tuple, first element is a 2-tuple = an edge weight, adjacency may just be placeholders for more expansion

7c387e6f91650b

MatthewRalston commented 8 months ago

Minor updates to graphpy_refactor_towards_fileutil.

The neighbor structure is still not working yet. At one point I knew what to call this

edges , adjacencies, neighbors.

adjacency duple (2-tuple) (node1 , node 2)

[subject <- > neighbor ] ( ... sort all nodes by subject only)

raah

In particular, it depends on what kind of data structure, matrix/np.array we want to use in the primary "traversal" process and what those other values might be. Specifically, we have node1 and node2, but they refer to kmer IDs of nodes in the debruijn graph. The third item of the tuple could be the weight, and the fourth item may be a Boolean as to whether the edge or adjacency is a kmer-neighbor (retrospective )or neighbor-kmer (prospective adjacency).

That's all I think I need right now to give the data to a cpu-based subroutine or a gpu-based subroutine. I don't know if we'll properly utilize the cugraph library yet, or make a networkx adjacent implementation....or whether our algorithm should just be a set of cuda operations.

I love this project! 🥰🤩😗🥳🙂‍↕️🤫🥱👻🙈

MatthewRalston commented 8 months ago

still hung up on the neighbor structure and edge list data structure needs.

( (n1, n2,) weight, adjacency:bool=True) 3-tuple, first element is a 2-tuple = an edge weight, adjacency may just be placeholders for more expansion

7c387e6f91650b

Re: placeholder

I'm not sure what parts should be numpy array vs unstructured, consistently N-length metadata, such as a "prospective" bool to determine whether the adjacency is bonafide from the sequences' kmer IDs?

Or whatever else might be needed.