dataware-tools / pydtk

A Python toolkit for managing, retrieving and processing data.
https://dataware-tools.github.io/pydtk/
Apache License 2.0
14 stars 0 forks source link

Listing large amount of metadata is slow #56

Closed d-hayashi closed 3 years ago

d-hayashi commented 3 years ago

Purpose

Feature request

Description

Listing metadata of 3k files with pydtk db list files is very slow though pydtk db list files --parsable finishes within a reasonable time. It seems that creating DataFrame in DBHandler is a bottleneck.

Symtem info [Optional]

yadach commented 3 years ago

This function is taking a long time to return DataFrame. https://github.com/dataware-tools/pydtk/blob/5a639aae3aa07d65ed6a27f578e8736effc133be/pydtk/db/v4/handlers/meta.py#L277

yadach commented 3 years ago

The default value is--limit=20, which is a reasonable time if not specified.