MatthewRalston / kmerdb

Python bioinformatics CLI for k-mer counts and de Bruijn graphs
https://matthewralston.github.io/kmerdb
Apache License 2.0
12 stars 1 forks source link

.kdbg format and variants #143

Open MatthewRalston opened 3 months ago

MatthewRalston commented 3 months ago

[Update 4 to the Assembly Algorithm project]https://github.com/users/MatthewRalston/projects/4

The goal of the alternate format (adjacency list structure) instead of the current 0.7.7 vanilla .kdbg spec would be to abbreviate known paths, potentially stored in an additional output file (.kdbg.stats.paths)

--[ ] .kdbg --[ ] .kdb.gi --[ ] .kdbg.stats.paths.tsv (--sparse adjacency list structure : new spec for tsv [ .paths vs .edges ] --[ ].kdbg.stats.edges.tsv (currently, identical to the 0.7.7 spec for .kdbg) --[ ] .kdbg.log

MatthewRalston commented 3 months ago

v0.7 was deprecated... and some usage changes happened, and now were at a different version. version 0.12.x wil be the next pre-alpha revelation.

MatthewRalston commented 3 months ago

Currently, just thinking of index files, to get the different side-formats and shape exploratory combinations of data out of the way.

MatthewRalston commented 3 months ago

Still thinking at this point that the system should use intermediary formats.

MatthewRalston commented 2 months ago

Just checking in, nothing to report, still working on a presentation, some system config, and the white paper template. Also, idk