UW-Madison-DSI / ospo-stats

1 stars 0 forks source link

Improve push to db workflow #3

Closed JasonLo closed 3 months ago

JasonLo commented 4 months ago

The code within the crawler https://github.com/UW-Madison-DSI/ospo-stats/blob/080cf6ca0e00a9bc85182d03e512a973f56283ff/ospo_stats/github/crawl.py#L147-L150 aims to perform an "upsert" operation on crawled records. This means it is designed to insert new records into the database if they do not already exist or update existing records if they do. However, there appears to be an issue where certain fields, such as crawled_at, are not being updated as expected during this upsert process.

To address this issue and ensure all relevant fields are correctly updated, it may be worthwhile to consider using a more direct approach, such as the insert-on-conflict method provided by SQLAlchemy. This method offers more granular control over how conflicts are handled during the insert operation, potentially leading to more reliable updates of the fields in question. Further investigation is needed to verify the current behavior and determine if switching to the "insert-on-conflict" method would indeed provide a better solution.

JasonLo commented 3 months ago

fixed in https://github.com/UW-Madison-DSI/ospo-stats/commit/fc122efce13994ff7305b66fccb40f4607442e41