adsabs / ADSImportPipeline

Data ingest pipeline for ADS classic->ADS+
GNU General Public License v3.0
1 stars 12 forks source link

Minor: logging ingest from files returns incorrect "percentage remaining" values #276

Open seasidesparrow opened 1 year ago

seasidesparrow commented 1 year ago

When processing a list of bibcodes from the command line (run.py --ignore-json-fingerprints -b@/tmp/bibcodes.txt), logger writes a message after each batch of 100 bibcodes are processed indicating both how many remain to be processed, and what percentage of bibcodes have been processed. However, the percentages are incorrect.

For example when processing a list of 682 bibcodes, the logfiles generated contain the following:

"message": "There are 582 records left (5.8% completed)"
"message": "There are 482 records left (4.8% completed)"
"message": "There are 382 records left (3.8% completed)"
"message": "There are 282 records left (2.8% completed)"
"message": "There are 182 records left (1.8% completed)"
"message": "There are 82 records left (0.8% completed)"

The calculation is done at L87-91 of run.py:

        if i / step > j:
            logger.info('There are %s records left (%0.1f%% completed)'
                        % (len(records)-i, ((len(records)-i) / 100.0)))
            j = i / step
        i += bpj

The result of this logic is that the code will print the number of remaining bibcodes left divided by 100, which is not a percentage, and is also not the number of bibcodes completed. The fraction of bibcodes completed is the total number minus i, divided by the step size, not 100. In this case, step is 6.82. So the logging statement should be using (len(records)-i for the number remaining, and j (= i/step) for the percentage completed.