-
```# ~/src/map4/map4/map4.py -i Ephrin.smi -o Ephrin.tsv
Traceback (most recent call last):
File "/home/berenger/src/map4/map4/map4.py", line 182, in
main()
File "/home/berenger/src/map4/…
-
I am trying to install tmap. After installing through pip, I get the error
```AttributeError: module 'tmap' has no attribute 'Minhash'```. I see that this issue previously created https://github.com/…
-
Currently my goal is to deduplicate **~750GB text (around 750 jsonl files, each is 1GB)**. My machine has **1TB RAM, 256 CPU cores**. I used the following config to run Minhash Deduplication but then …
-
https://blog.nelhage.com/post/fuzzy-dedup/
-
Current Situation: The nodes of the mapper graph currently store the IDs of the points contained within them. However, this information is not always necessary.
Proposed Improvement:
- Optional …
-
### Description
Currently, the `WindowedWords` iterator doesn't properly handle text with multiple consecutive spaces between words. The functionality is intentionally disabled (commented out) in the …
-
Hi team!
It seems Exact Dedup, MinHash, Suffix Array, and Bloom Filter are studied in the dedup ablation experiments. I have a couple of questions related to these:
* Where can I find the code fo…
-
Dear DotHash team,
Thanks for a nice paper, I have 2 questions related to both MinHash and link prediction:
1. What MinHash was used, on permutation hashing with optimal densification (http://pr…
-
Dear Camille,
Many thanks for making the "Sketching in sequence bioinformatics: methods and applications" slides open source. Several questions related to bottom-s MinHash, One Permutation MinHash …
-
Seeing as this repo inherits lots of code from https://github.com/ekzhu/datasketch, it should be noted that the implementation of mersenne prime hashing used in both repos causes overflows, and poten…
Apsod updated
5 months ago