issues
search
NickCrews
/
mismo
The SQL/Ibis powered sklearn of record linkage
https://nickcrews.github.io/mismo/
GNU Lesser General Public License v3.0
12
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add Levenshtein ratio to mismo.text
#46
jstammers
closed
1 day ago
1
DuckDb ConversionException when running MinHashLSH example
#45
lmores
opened
1 week ago
2
Add ability to sample from blocked pairs when training an FS model
#44
jstammers
opened
2 weeks ago
3
[ImgBot] Optimize images
#43
imgbot[bot]
opened
1 month ago
0
Poor scaling of add_tfidf to larger datasets
#42
jstammers
closed
1 month ago
7
expose liepzig affiliations dataset
#41
NickCrews
opened
1 month ago
0
Add Address Parsing and Comparison with Postal
#40
jstammers
opened
1 month ago
13
Fix typo in doc
#39
lmores
closed
1 month ago
1
Incremental clustering
#36
lmores
closed
2 months ago
1
Inefficient Sampling From Known Labels
#35
jstammers
closed
1 month ago
17
Add Leipzig affiliations raw dataset
#34
OlivierBinette
closed
1 month ago
1
Add RLData datasets
#33
OlivierBinette
closed
1 month ago
1
benchmarks for array.filter(x -> x.isin(<column from other relation>))
#32
NickCrews
closed
3 months ago
1
Add TF-IDF comparer based on sklearn
#31
NickCrews
opened
3 months ago
0
chore(deps): bump the github-actions group with 1 update
#30
dependabot[bot]
closed
3 months ago
0
joining on arrays is slow
#29
NickCrews
closed
4 months ago
3
explore ipydatagrid for showing data
#28
NickCrews
opened
4 months ago
0
feat: test on spark using docker
#27
NickCrews
opened
4 months ago
1
Consider supporting latent-entity based algorithms
#26
NickCrews
opened
4 months ago
0
Add RLdata and Union Army datasets
#25
OlivierBinette
closed
3 months ago
8
Add datasets
#24
OlivierBinette
closed
5 months ago
2
Minor changes to documentation and contribution guide
#23
OlivierBinette
closed
5 months ago
3
feat: plot clusters
#22
NickCrews
closed
5 months ago
1
Testing: test_fs is too computationally and memory intensive
#21
OlivierBinette
closed
5 months ago
8
Why deal with left and right tables?
#20
OlivierBinette
closed
6 months ago
4
Testing: Standardized workflow and datasets for speed and performance benchmarking
#19
OlivierBinette
opened
6 months ago
2
FEAT: Implement Pipelines?
#18
OlivierBinette
closed
6 months ago
2
Testing: Refactor some of the tests to facilitate test-driven development and modularity
#17
OlivierBinette
closed
6 months ago
2
Design: Should type aliases be used for Ibis types?
#16
OlivierBinette
opened
6 months ago
2
Block using KDTree
#15
NickCrews
opened
7 months ago
0
Eval: look into new cluster eval metric
#14
NickCrews
opened
7 months ago
0
Add more example datasets
#13
NickCrews
opened
7 months ago
4
Add usage note on metrics
#12
NickCrews
opened
8 months ago
1
Assess link quality via comparison of links vs non-links
#11
NickCrews
opened
8 months ago
0
Assess link quality via sensitivity analysis
#10
NickCrews
opened
8 months ago
0
EM algorithm of FS model for unlabeled data
#9
NickCrews
opened
8 months ago
0
FEAT: wizard that checks things for you
#8
NickCrews
opened
8 months ago
0
viz: plot blocking with https://upset.app/
#7
NickCrews
closed
8 months ago
1
clustering: sillouette, rand index, adjusted rand index, Mutual Information, etc. reference
#6
NickCrews
closed
5 months ago
1
better connected_components() API
#5
NickCrews
closed
5 months ago
1
ROADMAP
#4
NickCrews
opened
11 months ago
0
viz: Plot pair scores using MDS
#3
NickCrews
opened
1 year ago
0
Consider using DuckDB for SQL operations
#2
NickCrews
closed
11 months ago
1
Support set-wise comparison and pooling
#1
NickCrews
closed
5 months ago
1