allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

tqdm units #92

Closed cmacdonald closed 3 years ago

cmacdonald commented 3 years ago

Is your feature request related to a problem? Please describe.

tqdm iterations are "it/sec". Can the units be given?

Describe the solution you'd like

Change log.py to pass units to tqdm where appropriate.

seanmacavaney commented 3 years ago

29 invocations of .pbar and .pbar_raw. A few already have unit. Some have the unit as the description. Should be easy to add.

./ir_datasets/ir_datasets/commands/build_clueweb_warc_indexes.py:
   33:     with _logger.pbar_raw(total=len(process_args)) as pbar:

./ir_datasets/ir_datasets/commands/doc_fifos.py:
   64:         docs_iter = _logger.pbar(docs_iter, total=dataset.docs_count())

./ir_datasets/ir_datasets/datasets/clueweb12.py:
  237:         with contextlib.ExitStack() as stack, _logger.pbar_raw(desc='building b13 document count cache') as pbar:

./ir_datasets/ir_datasets/datasets/dpr_w100.py:
   52:             for record in _logger.pbar(ijson.items(stream, 'item'), 'building dpr-w100'):

./ir_datasets/ir_datasets/datasets/gov2.py:
  275:             with _logger.pbar_raw(desc='building doccounts file', total=25205179) as pbar:

./ir_datasets/ir_datasets/datasets/msmarco_passage.py:
   50:             for line in _logger.pbar(stream, desc='extracting QID/PID pairs'):
   79:              _logger.pbar_raw(desc='fixing encoding', unit='B', unit_scale=True) as pbar:

./ir_datasets/ir_datasets/datasets/msmarco_qna.py:
  103:         for doc in _logger.pbar(ir_datasets.load('msmarco-passage').docs_iter(), desc='building msmarco-passage lookup', total=ir_datasets.load('msmarco-passage').docs_count()):
  108:         for doc in _logger.pbar(ir_datasets.load('msmarco-document').docs_iter(), desc='building msmarco-document lookup', total=ir_datasets.load('msmarco-document').docs_count()):
  124:             pbar = outer_stack.enter_context(_logger.pbar_raw(desc='processing qna', postfix=pbar_postfix))
  238:                     for columns in _logger.pbar(zip(*in_files), desc=f'merging {file_str} files'):

./ir_datasets/ir_datasets/datasets/natural_questions.py:
   58:             pbar = stack.enter_context(_logger.pbar_raw(desc='processing nq', postfix=pbar_postfix))

./ir_datasets/ir_datasets/datasets/tripclick.py:
  101:             for doc in _logger.pbar(self._collection.docs_iter(), desc='build doc lookup'):
  107:             for query in _logger.pbar(self._queries.queries_iter(), desc='build query lookup'):
  112:                  _logger.pbar_raw(desc='building docpairs', total=23_222_038) as pbar:

./ir_datasets/ir_datasets/datasets/tweets2013_ia.py:
  309:             with _logger.pbar_raw(desc='tweets') as pbar, contextlib.ExitStack() as stack:

./ir_datasets/ir_datasets/indices/lz4_pickle.py:
  272:                 for doc in _logger.pbar(self.init_iter_fn(), 'docs_iter'):

./ir_datasets/ir_datasets/util/download.py:
   89:                         pbar = stack.enter_context(_logger.pbar_raw(desc=self.url, total=dlen, unit='B', unit_scale=True, bar_format=fmt, file=pbar_f))
  128:                 skip_pbar = stack.enter_context(_logger.pbar_raw(desc=f'skipping ahead to {skip}', total=skip, unit='B', unit_scale=True, bar_format=fmt))

./ir_datasets/test/integration/base.py:
   20:             for i, doc in enumerate(_logger.pbar(dataset.docs_iter(), f'{dataset_name} docs')):
   58:             for i, query in enumerate(_logger.pbar(dataset.queries_iter(), f'{dataset_name} queries')):
   80:             for i, qrel in enumerate(_logger.pbar(dataset.qrels_iter(), f'{dataset_name} qrels')):
  102:             for i, scoreddoc in enumerate(_logger.pbar(dataset.scoreddocs_iter(), f'{dataset_name} scoreddocs')):
  124:             for i, docpair in enumerate(_logger.pbar(dataset.docpairs_iter(), f'{dataset_name} docpairs')):
  144:         for i, doc in enumerate(_logger.pbar(dataset.docs_iter(), f'{dataset_name} docs')):
  164:         for i, query in enumerate(_logger.pbar(dataset.queries_iter(), f'{dataset_name} queries')):
  180:         for i, qrel in enumerate(_logger.pbar(dataset.qrels_iter(), f'{dataset_name} qrels')):
  196:         for i, scoreddoc in enumerate(_logger.pbar(dataset.scoreddocs_iter(), f'{dataset_name} scoreddocs')):
  208:         for i, docpair in enumerate(_logger.pbar(ir_datasets.load(dataset_name).docpairs_iter(), f'{dataset_name} docpairs')):
cmacdonald commented 3 years ago

super!