The page_entities files should get cached across pipeline runs. Currently, it seems they’re always getting rebuilt from scratch, even if they’re already present in S3 storage. The uploading to storage seems to be working perfectly fine. However, it seems we don’t do the right checks to see whether a cached version is already present.
The
page_entities
files should get cached across pipeline runs. Currently, it seems they’re always getting rebuilt from scratch, even if they’re already present in S3 storage. The uploading to storage seems to be working perfectly fine. However, it seems we don’t do the right checks to see whether a cached version is already present.