To address this issue and ensure all relevant fields are correctly updated, it may be worthwhile to consider using a more direct approach, such as the insert-on-conflict method provided by SQLAlchemy. This method offers more granular control over how conflicts are handled during the insert operation, potentially leading to more reliable updates of the fields in question. Further investigation is needed to verify the current behavior and determine if switching to the "insert-on-conflict" method would indeed provide a better solution.
The code within the crawler https://github.com/UW-Madison-DSI/ospo-stats/blob/080cf6ca0e00a9bc85182d03e512a973f56283ff/ospo_stats/github/crawl.py#L147-L150 aims to perform an "upsert" operation on crawled records. This means it is designed to insert new records into the database if they do not already exist or update existing records if they do. However, there appears to be an issue where certain fields, such as
crawled_at
, are not being updated as expected during this upsert process.To address this issue and ensure all relevant fields are correctly updated, it may be worthwhile to consider using a more direct approach, such as the insert-on-conflict method provided by SQLAlchemy. This method offers more granular control over how conflicts are handled during the insert operation, potentially leading to more reliable updates of the fields in question. Further investigation is needed to verify the current behavior and determine if switching to the "insert-on-conflict" method would indeed provide a better solution.