BTW, one way to avoid re-ingest via direct ingest is to update the direct ingest pipeline to use this ingest list, where only new records should appear:
/proj/ads_abstracts/sources/ArXiv/log/2021-01-07/new_records.tsv
Rather than these:
/proj/ads_abstracts/sources/ArXiv/UpdateAgent/UpdateAgent.out.2021-01-07.gz
Kelly modified the myADS pipeline to use the former, but I don't believe DI was ever updated. This should fix the immediate problem. Nonetheless, we still need to get the deletions right sooner or later.
From email from @aaccomazzi on 2021-Jan-08: