Open moonvalley-matt opened 3 weeks ago
Thanks @moonvalley-matt, I believe also reported by our users, if you change to htype=="text", then speed should be much faster.
@levonohanyan is looking into making the performance uniformly fast across all string types.
Hi @moonvalley-matt,
Seems the issue is not generally reproducible and depends on the specific version of deeplake, python or numpy. Can you please provide more details about the versions you used. If there’s a reproducible script that’d be better.
Regards, Levon
Severity
P1 - Urgent, but non-breaking
Current Behavior
I have a dataset of ~1M rows that has a column of np.str_ in the metadata. It takes 4 seconds / 1000 records to load this column, while it takes seconds for 1,000,000 records for integer columns.
Steps to Reproduce
Create a dataset of 1,000,000 rows with a metadata of a mixture of strings and integers.
Expected/Desired Behavior
Strings should load approximately as fast as integers, otherwise are there other recommendations? Trying to understand the nature of the problem
Python Version
No response
OS
No response
IDE
No response
Packages
No response
Additional Context
No response
Possible Solution
No response
Are you willing to submit a PR?