issues
search
dkoslicki
/
CMash
Fast and accurate set similarity estimation via containment min hash
BSD 3-Clause "New" or "Revised" License
42
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is CMash metric?
#32
jianshu93
opened
1 year ago
4
Add KMC as requirement
#31
dkoslicki
opened
4 years ago
0
Ground truth computation is too slow for realistic data set sizes
#30
dkoslicki
closed
4 years ago
1
For very large databases, creation of TST is slow and memory intensive
#29
dkoslicki
opened
4 years ago
6
Gzipping all training files results in a nice reduction: add feature that allows scripts/modules to handle this
#28
dkoslicki
opened
4 years ago
0
In MakeStreamingDNADatabase.py, don't require output directory to exist
#27
dkoslicki
opened
4 years ago
0
Create script to demonstrate how to re-train CMash
#26
dkoslicki
opened
4 years ago
0
Minor refactor of Query.py
#25
dkoslicki
closed
4 years ago
1
Modularize StreamingQueryDNADatabase.py
#24
dkoslicki
closed
4 years ago
3
In post-processing, find correct denominator
#23
dkoslicki
opened
4 years ago
0
StreamingQueryDNA... may be case sensitive to sequences
#22
dkoslicki
closed
4 years ago
1
Test k-mer frequency distribution idea
#21
dkoslicki
opened
4 years ago
2
Multiple k-mer sizes confirmation and testing
#20
dkoslicki
opened
4 years ago
2
Multiple k-mer sizes bug
#19
dkoslicki
closed
4 years ago
2
Restructure repo so it's clear which approach is using the streaming, and which is using Bloom filters
#18
dkoslicki
opened
4 years ago
0
Update PyPi release of CMash
#17
dkoslicki
opened
4 years ago
0
Make a Conda/bioconda release of CMash
#16
dkoslicki
opened
4 years ago
3
Improved classification time with KMC
#15
dkoslicki
opened
4 years ago
16
Testing environment
#14
dkoslicki
opened
4 years ago
9
Make python 3 compliant
#13
dkoslicki
closed
4 years ago
0
Python3 hanging
#12
dkoslicki
closed
4 years ago
2
Require pandas version>=0.21.1
#11
dkoslicki
closed
5 years ago
0
Make sure jaccard isn't counting blanks/Inf as matches
#10
dkoslicki
closed
4 years ago
1
k-merization makes no attempt to avoid Ns
#9
rsharris
closed
4 years ago
3
Nondeterministic behavior of output queue
#8
dkoslicki
closed
6 years ago
0
Deal with the duplicate small k-mers in the sketches
#7
dkoslicki
closed
6 years ago
0
StreamingQuery isn't sharing the set
#6
dkoslicki
closed
6 years ago
0
Parallelize the "collect the intersection counts"
#5
dkoslicki
closed
6 years ago
1
Bloom filter for pre-filter of kmers
#4
dkoslicki
closed
6 years ago
0
Installation woes -- TLSV1_ALERT_PROTOCOL_VERSION and pytest-runner
#3
rsharris
closed
6 years ago
8
Figure out what to do with the reverse complements
#2
dkoslicki
closed
4 years ago
2
Create LICENSE
#1
dkoslicki
closed
7 years ago
0