When processing a list of bibcodes from the command line (run.py --ignore-json-fingerprints -b@/tmp/bibcodes.txt), logger writes a message after each batch of 100 bibcodes are processed indicating both how many remain to be processed, and what percentage of bibcodes have been processed. However, the percentages are incorrect.
For example when processing a list of 682 bibcodes, the logfiles generated contain the following:
"message": "There are 582 records left (5.8% completed)"
"message": "There are 482 records left (4.8% completed)"
"message": "There are 382 records left (3.8% completed)"
"message": "There are 282 records left (2.8% completed)"
"message": "There are 182 records left (1.8% completed)"
"message": "There are 82 records left (0.8% completed)"
The calculation is done at L87-91 of run.py:
if i / step > j:
logger.info('There are %s records left (%0.1f%% completed)'
% (len(records)-i, ((len(records)-i) / 100.0)))
j = i / step
i += bpj
The result of this logic is that the code will print the number of remaining bibcodes left divided by 100, which is not a percentage, and is also not the number of bibcodes completed. The fraction of bibcodes completed is the total number minus i, divided by the step size, not 100. In this case, step is 6.82. So the logging statement should be using (len(records)-i for the number remaining, and j (= i/step) for the percentage completed.
When processing a list of bibcodes from the command line (run.py --ignore-json-fingerprints -b@/tmp/bibcodes.txt), logger writes a message after each batch of 100 bibcodes are processed indicating both how many remain to be processed, and what percentage of bibcodes have been processed. However, the percentages are incorrect.
For example when processing a list of 682 bibcodes, the logfiles generated contain the following:
The calculation is done at L87-91 of run.py:
The result of this logic is that the code will print the number of remaining bibcodes left divided by 100, which is not a percentage, and is also not the number of bibcodes completed. The fraction of bibcodes completed is the total number minus i, divided by the step size, not 100. In this case, step is 6.82. So the logging statement should be using (len(records)-i for the number remaining, and j (= i/step) for the percentage completed.