adsabs / ADSDocMatchPipeline

Pipeline to match publisher document with preprint counterpart and vice versa
MIT License
1 stars 4 forks source link

fixed to create a combined file, even if the classic output is missing… #20

Closed golnazads closed 1 year ago

golnazads commented 1 year ago

…or empty, adjusted the unittests. Also renamed status_code to status_flaw, since it was more than a code info in that field

seasidesparrow commented 1 year ago

Quick favor: would you please update requirements.txt to use ADSGoogleConnector.git@v0.0.3 instead of v0.0.1?

golnazads commented 1 year ago

Done.

On Wed, May 31, 2023 at 2:03 PM Matthew Templeton @.***> wrote:

Quick favor: would you please update requirements.txt to use @.*** instead of v0.0.1?

— Reply to this email directly, view it on GitHub https://github.com/adsabs/ADSDocMatchPipeline/pull/20#issuecomment-1570678695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CD2CLMN2F4XVYGILL3XI6BWZANCNFSM6AAAAAAYVVJFUQ . You are receiving this because you authored the thread.Message ID: @.***>

seasidesparrow commented 1 year ago

In testing oracle updating, if there are no matches to be added to oracle, the update_db_curated_matches method in oracle_util will return an empty string. Line #156 of run.py checks the value of status to determine whether to archive those spreadsheets. It is possible that a curated spreadsheet may have no additional matches, in which case status will be interpreted as None, spreadsheet_util.archive(spreadsheet_filename) will not run, and the spreadsheet will remain in the "curated" folder where it will again be processed by oracle update and ignored.

An alternative would be to separate archiving results from the update process in run.py, so that archiving occurs as long as update_db returns without an exception:

                    try:
                        filename = spreadsheet_util.download(spreadsheet_filename)
                        status = oracle_util.update_db_curated_matches(filename)
                    except Exception as err:
                        logger.warning("Unable to add curated sheet (%s) to local oracledb: %s" % (spreadsheet_filename, err))
                    else:
                        logger.info("Processed file `%s`. %s" % (spreadsheet_filename, status))
                        spreadsheet_util.archive(spreadsheet_filename)
golnazads commented 1 year ago

yes, that is what I shall do now.

On Thu, Jun 1, 2023 at 11:49 AM Matthew Templeton @.***> wrote:

In testing oracle updating, if there are no matches to be added to oracle, the update_db_curated_matches method in oracle_util will return an empty string. Line #156 of run.py checks the value of status to determine whether to archive those spreadsheets. It is possible that a curated spreadsheet may have no additional matches, in which case status will be interpreted as None, and spreasheet_util.archive(spreadsheet_filename) will not run, and the spreadsheet will remain in the "curated" folder where it will again be processed by oracle update and ignored.

oracle_util.py should return some string (e.g. "no matches found in curated file") in update_db_curated_matches at line 504.

— Reply to this email directly, view it on GitHub https://github.com/adsabs/ADSDocMatchPipeline/pull/20#issuecomment-1572306597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CAABATK6OAR3RKNYLTXJC22NANCNFSM6AAAAAAYVVJFUQ . You are receiving this because you authored the thread.Message ID: @.***>

seasidesparrow commented 1 year ago

Awesome, looks good!