We are getting errors for some HuggingFace Datasets in the connector src/connectors/huggingface/huggingface_dataset_connector.py. As a result the datasets are not added to the database.
Some specific identified issues:
[x] issue in creation of distributions, where content_size_kb is exceeding the integer limit
[x] issue in creation of distributions, where content_url is exceeding field_length.NORMAL (set to 256)
[ ] duplicate dataset tags/keywords due to omitting special accented characters, e.g. málrómur and malromur
We are getting errors for some HuggingFace Datasets in the connector
src/connectors/huggingface/huggingface_dataset_connector.py
. As a result the datasets are not added to the database.Some specific identified issues:
distributions
, wherecontent_size_kb
is exceeding the integer limitdistributions
, wherecontent_url
is exceedingfield_length.NORMAL
(set to 256)