-
### What happened + What you expected to happen
1. I use `add_column` to add a empty dict `meta` to dataset and then use `map` to update the `meta` dict, but get a wrong dataset.
```python
d…
-
Hi, I'm setting up a local Spark cluster but I have a problem that my data is too large and cannot be stored in a single machine in the cluster for Pyspark to load and process later.
My data consis…
-
## Description
There are incorrect and confusing function names and name parameters in the deserializers for the datasketch components:
see: https://github.com/whylabs/whylogs/blob/mainline/python…
-
Hi Team,
I tried testing **datasketch** package on Arm64 architectures but it is generating **13 errors out of 104 Tests** for `nosetests --exclude-dir=test/aio` command. The error basically is re…
-
Hey, how do the following arguments for MinHash:
```
python -m text_dedup.minhash \
--ngram 1 \
--num_perm 128 \
--threshold 0.8 \
```
relate to the parameters of the Lee et al. pap…
-
Thanks for your work !
I have a question of 【Table 3: Overall EA results on DBP1M】. At this table, you give the running time of your model. I want to know what the time include and the actual unit of…
-
**Description of the problem:**
This just popped up in the librosa test suite after the environments upgraded from pooch 1.6 to 1.7.
We have a registry of data files that are fetched by pooch, a…
-
Creating an issue to track the known timeout related upload failure bugs as they can be difficult to diagnose.
If anyone thinks they are affected by this bug please add a comment as most users are …
-
Multiple pull requests now had the following issue in the tests:
```
************* Module timesketch_cli_client.commands.search
cli_client/python/timesketch_cli_client/commands/search.py:22:0: E0…
-
I ran your code for minhashing using your "Lorem ipsum..." doc examples but bumped the value on the minhashSigner up to 1200 for better accuracy. It consistently returns values centered around .70 rat…