CircleCI-Public / python-orb

Common CircleCI tasks for the Python programming language.
https://circleci.com/developer/orbs/orb/circleci/python
MIT License
13 stars 37 forks source link

Potentially undesirable behavior: Lockfile deleted and recreated #111

Closed tboddyspargo closed 16 hours ago

tboddyspargo commented 1 year ago

Describe Request:

Thanks for this awesome, orb, everyone! This really helps avoid re-inventing the wheel within the community!

I just wanted to call out this odd behavior and ask if there was a specific reason for it:

Within the Copy to cache directory step of the install-packages command

              LOCKFILE_PATH="${CACHE_DIR}/lockfile"

              if [ -e "${LOCKFILE_PATH}" ]; then
                  rm -f "${LOCKFILE_PATH}"
              fi

              if [ -e "${LOCK_FILE}" ]; then
                  FULL_LOCK_FILE=$(readlink -f "${LOCK_FILE}")

                  echo "INFO: Copying ${FULL_LOCK_FILE} to ${LOCKFILE_PATH}"
                  cp "${FULL_LOCK_FILE}" "${LOCKFILE_PATH}"
              fi

This code appears to be deleting and then immediately re-creating the temporary lockfile. Why?

FWIW: I think it would be good for this command to clean up after itself (by removing the temporary lockfile copy), but it would need to do so after the save_cache step.

Supporting Documentation Links:

marboledacci commented 6 days ago

This was created a while ago and this code was part of the orb since the beginning. The person who made it is no longer part of CircleCI so I cannot be certain, but for my understanding of the code, it is done this way so the there is no lockfile when the project doesn't have one defined. It could happen that LOCKFILE_PATH exists when LOCK_FILE doesn't, so this prevents to keep that LOCKFILE_PATH there.

About the clean up you mention, I don't understand very well what do you mean. Would you like to delete LOCKFILE_PATH after the save_cache step? In that case, why?

tboddyspargo commented 6 days ago

It's been a while since I thought about this, but I guess I'm just not sure about that scenario.

It seems to me that if LOCKFILE_PATH exists at the beginning of the command, cache-link-lockfile.sh trusts that it is accurate (the assumption being that it's a reserved path and that if it exists, then it was created by us and is up to date?). At the end of cache-link-lockfile.sh, LOCKFILE_PATH should either exist and be trustworthy, or not exist and we warned that there was a problem finding it. At that point, if LOCKFILE_PATH exists, we're treating it as a reliable cache key in subsequent command steps. It just seems unnecessary to me that we might consider LOCKFILE_PATH unreliable as a cache key after cache-save.sh if we didn't before then.

In other words, If LOCKFILE_PATH exists prior to cache-link-lockfile.sh, why would we trust it enough to restore the cache based on it, but not trust it enough to save the cache again at the end? Is there a scenario where LOCK_FILE changed in between that I haven't considered? Or is there simply a reason I'm not thinking of that it is important to trust the file for restoring the cache, but not for saving it at the end?

marboledacci commented 5 days ago

Looking through the history of the file, and the issues and PRs related to it, I found the LOCKFILE_PATH used to be a hard link, so instead of deleting, it was unlinked to be able to link again. This was later changed to be a copy instead of a link, and therefore the unlink became a rm. I would say is not really required now, but for the way the changes were made it was not considered.

tboddyspargo commented 5 days ago

I see, thanks! Might be worth tidying up at some point, but clearly it's not a very impactful detail.

marboledacci commented 16 hours ago

Thanks for the heads up @tboddyspargo, for now I will close this issue. We will take a look at cleaning this up in the next major release.