facebook / metro

🚇 The JavaScript bundler for React Native
https://metrobundler.dev
MIT License
5.21k stars 621 forks source link

concurrent builds fail due to shared use of /tmp/metro-cache/ #331

Open jgreen210 opened 5 years ago

jgreen210 commented 5 years ago

Do you want to request a feature or report a bug?

bug

What is the current behavior?

Concurrently-running react-native builds often fail with e.g.:

ENOENT: no such file or directory, open '/tmp/metro-cache/03/2ed8513838f8bad0238a0a6032e415d1ea766f50f498e7d284ac6a05cb0b4000001234'

Error: ENOENT: no such file or directory, open '/tmp/metro-cache/03/2ed8513838f8bad0238a0a6032e415d1ea766f50f498e7d284ac6a05cb0b400001234'
    at Object.fs.openSync (fs.js:646:18)
    at Object.fs.writeFileSync (fs.js:1291:33)
    at FileStore.set (/Users/jenkins/src/node_modules/metro-cache/src/stores/FileStore.js:43:8)
    at Cache.set (/Users/jenkins/src/node_modules/metro-cache/src/Cache.js:96:31)
    at /Users/jenkins/src/node_modules/react-native/node_modules/metro/src/DeltaBundler/Transformer.js:127:13
    at Generator.next (<anonymous>)
    at step (/Users/jenkins/src/node_modules/react-native/node_modules/metro/src/DeltaBundler/Transformer.js:11:657)
    at /Users/jenkins/src/node_modules/react-native/node_modules/metro/src/DeltaBundler/Transformer.js:11:817
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

If the current behavior is a bug, please provide the steps to reproduce and a minimal repository on GitHub that we can yarn install and yarn test.

What is the expected behavior?

I'd expect concurrently-running builds not to interfere with each other.

Please provide your exact Metro configuration and mention your Metro, node, yarn/npm version and operating system.

MacOS 10.13.3 node 8.9.4 npm 5.6.0 started seeing this when upgraded react-native from 0.55.4 to 0.57.2 => (from our package-lock.json) metro went from 0.30.2 to 0.47.1

jgreen210 commented 5 years ago

I've not investigated this as deeply as I could yet, since a quick investigation has brought up some questions about the purpose of /tmp/metro-cache/...

The cache is under /tmp, so it's presumably intended to be ephemeral. However, there's some code that's cleaning up entries older than three days. It wasn't immediately clear to me if this code was being used in our case. The presence of this cleanup code suggests /tmp/metro-cache/ is meant to be a permanent cache. So, why's it under /tmp and not under e.g. ~/.metro/? Why aren't reads from the cache updating cache files' mtimes (turning the cache into an LRU cache)?

Why aren't writes to cache files done into same-directory temporary files (with distinct, random, non-colliding names) and renamed into their final location? This operation is atomic on POSIX, so would make collisions benign. It doesn't appear to be the cause of the problems I am seeing - they are too common and seem to relate to missing directories, not colliding writes.

Why is some code calling a recently-added clear() method? This is removing directories, so is perhaps the cause of the problems I am seeing (I've not worked out if this clear() is getting called yet). If the cache should be cleared at startup, it should be owned by the process that starts it, so should get its own new temporary directory. That would solve the collisions, but wouldn't make the cache very efficient. Why was clearing added?

I can investigate further and perhaps contribute some PRs, but would need to understand what this cache is supposed to be doing before I can do that.

I don't see anything likely to fix this in versions of metro after 0.47.1.

rafeca commented 5 years ago

Thanks for the report!!

Regarding your questions about the caching system, we're storing artifacts in /tmp since they are considered ephemeral (as you mentioned), so the presence or absence of a cache artifact should not affect the behaviour of Metro (would only make it slower).

The file-based cache is used to be able to speed up builds between different executions of Metro.

However, there's some code that's cleaning up entries older than three days. It wasn't immediately clear to me if this code was being used in our case. The presence of this cleanup code suggests /tmp/metro-cache/ is meant to be a permanent cache.

Metro should not have any cleanup code for old versions of artifacts (a long time ago there was such logic but it was removed since it's not needed).

Regarding your particular issue, in fact there's a problem if you call clear() while there's another build happening in parallel. This may happen if your build system is passing the --reset-cache param to Metro (which is used to clear all the local cache). The root cause is that the clear() method is not atomic: https://github.com/facebook/metro/blob/master/packages/metro-cache/src/stores/FileStore.js#L65-L66

Clearing the caches should be a rare thing (it's provided to easily remove all the state in case it gets stale or corrupted), but my guess is that the React Native CLI is always clearing the caches on production builds to provide more safety around caching issues. Instead of that, the CLI should configure Metro to not use any cache during production builds. This can be done by passing an empty array to cacheStores in the metro config.

Can you check if this can be fixed in the react native CLI project?

jgreen210 commented 5 years ago

Other tools typically cache these things in e.g. ~/.metro/ not /tmp. That's presumably since /tmp is typically cleaned up on reboot and/or daily:

https://askubuntu.com/questions/20783/how-is-the-tmp-directory-cleaned-up https://superuser.com/questions/187071/in-macos-how-often-is-tmp-deleted

These OS scripts could delete directories while a build is running, which could cause a build failure. I'm seeing failures throughout the day though, so while I think this should probably be fixed, it's not the main problem for me.

Here's the mtime-related cleanup code I found:

https://github.com/facebook/metro/blob/v0.50.0/packages/metro-cache/src/stores/AutoCleanFileStore.js#L82

I've still not looked if it's used by us or anyone, but it's not been removed.

I'll have a look at --reset-cache and an empty cacheStores.

It doesn't make too much sense to allow one process to clear the cache even if another is still using it, but it would be possible to stop it causing failures... Metro could clear the cache by removing the files in the cache directories not the directories themselves. If also used temp file and rename not in-place write, that would make interaction between one process clearing cache and another writing/reading to it benign. Would need a little care to make clear-clear interactions benign too - i.e. not expect file you try to delete to exist when try to delete it (since another process might remove file too after first process lists files).

rafeca commented 5 years ago

Other tools typically cache these things in e.g. ~/.metro/ not /tmp. That's presumably since /tmp is typically cleaned up on reboot and/or daily:

https://askubuntu.com/questions/20783/how-is-the-tmp-directory-cleaned-up https://superuser.com/questions/187071/in-macos-how-often-is-tmp-deleted

It was decided to store the artifacts in the OS default temp folder to delegate the cleaning of the cache to the OS, so Metro don't have to worry about using too much space and the OS can decide to delete things based on available disk space/etc.

Another option would have been to store things in the home folder, this way Metro would have more control over the cleaning process but at the expenses of eating tons of HDD space

These OS scripts could delete directories while a build is running, which could cause a build failure. I'm seeing failures throughout the day though, so while I think this should probably be fixed, it's not the main problem for me.

Metro caching should handle correctly external scripts deleting its temp folder, there may be a bug there that we can easily fix (e.g catching potential errors when writing to the cache to not fail). If you find any specific issue and you want to collaborate, I'll be happy to review any PR on that topic 😃

Here's the mtime-related cleanup code I found:

https://github.com/facebook/metro/blob/v0.50.0/packages/metro-cache/src/stores/AutoCleanFileStore.js#L82

Oh wow! I didn't even know that we had an auto-cleaning cache store 😅 This was created by @CompuIves and it's not being used by the default configuration. If you want you can use it and it will work, and this will allow you to have your cache folder e.g in ~/.metro as opposed to $TMPDIR.

jgreen210 commented 5 years ago

These react-native scripts are using --reset-cache:

https://github.com/facebook/react-native/blob/master/react.gradle https://github.com/facebook/react-native/blob/master/scripts/react-native-xcode.sh

mikeplynch commented 5 years ago

@jgreen210 Did configuring with the empty cacheStores resolve your concurrent builds failing? I seem to be facing the same issue now that I've upgraded my project to the latest react-native

jgreen210 commented 5 years ago

Avoiding intentional concurrency was easier than sorting out a CLA for a contribution here, was easier than looking at turning off caching, is probably a good thing to do in itself and I've been busy with other things, so I'm afraid I don't know.

My team's still seeing some react-native metro cache corruption though, and it could be caused by this issue - e.g. from unexpected concurrency between command-line builds, IDE builds and the packager (which is automatically started by various things if not already bound to its port). The case I was thinking of here (something someone mentioned recently on my team's slack channel) didn't involve /tmp/metro-cache, so must have been something else. I assume this /tmp/metroc-cache issue still exists though, and would affect developers running react native builds - https://github.com/facebook/metro/commits/master/packages/metro-cache/src/stores/FileStore.js hasn't been changed since I filed the issue.

It didn't look like it would be too difficult to implement the things I mentioned above.

hasan-sh commented 5 years ago

Reset the cache react-native start --reset-cache

jgfoster commented 5 years ago

Note that resetting the cache will only work if you have write permission to the newly created files and directories. If you are on a shared machine with your own login, then you can't build and you can't cleanup. Could you make the directory world writable so others can delete things?

jgreen210 commented 5 years ago

Hi @jgfoster,

Making it world writable would introduce a security issue (one that you might not care about, but that someone else would). You'd still have concurrency issues.

jgfoster commented 5 years ago

Hi @jgreen210, How would you suggest the problem be solved? Would it be sufficient to add $USER to the path?

maxammann commented 5 years ago

I'm having the same problem on macos:

EINVAL: invalid argument, unlink '/var/folders/fq/8gxcxtfx4wx69slm23ktc29w0000gn/T/metro-cache/38/9bac2994e779866ecc70a348ea4b9ba509f65490a88db6e1469ba7d8eacf86f11e7578'

Error: EINVAL: invalid argument, unlink '/var/folders/fq/8gxcxtfx4wx69slm23ktc29w0000gn/T/metro-cache/38/9bac2994e779866ecc70a348ea4b9ba509f65490a88db6e1469ba7d8eacf86f11e7578'
    at Object.unlinkSync (fs.js:972:3)
    at rimrafSync (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/rimraf/rimraf.js:306:17)
    at /Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/rimraf/rimraf.js:342:5
    at Array.forEach (<anonymous>)
    at rmkidsSync (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/rimraf/rimraf.js:341:26)
    at rmdirSync (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/rimraf/rimraf.js:334:7)
    at Function.rimrafSync [as sync] (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/rimraf/rimraf.js:304:9)
    at FileStore._removeDirs (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/metro-cache/src/stores/FileStore.js:61:14)
    at FileStore.clear (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/metro-cache/src/stores/FileStore.js:41:10)
    at Server._config.cacheStores.forEach.store (/Users/ammann/jenkins/workspace/tegreat-react-native-app_develop/node_modules/metro/src/Server.js:180:55)

Concurrent builds work fine on Ubuntu. Is it possible to change the path of the cache by some environment variable/config?

maxammann commented 5 years ago

I fixed it by using an env variable: https://github.com/Integreat/integreat-react-native-app/pull/43/commits/80b38673defa1af6d0fcd9f88e063a4c648223a4 https://github.com/Integreat/integreat-react-native-app/pull/43/commits/8d3b06ba1d7d21c4a2caa66e0675bb854b922a71

Tobrek commented 5 years ago

That means, if I have two project on Jenkins with two different users, which both use metro, i have to configure the env variable in BOTH projects to prevent permission problems, because the /tmp/metro-cache folder depends to the first user who created it? (Or to find another workaround)

maxammann commented 5 years ago

One would be enough but it is better to disable the cache for both pipelines. Afaik the problem is not a permission problem but that one build resets the cache while another build is using it. In my case only one of the two builds failed.

punksta commented 5 years ago

it might be fixed by providing custom TMPDIR folder for each build.

I am caching metro cache this way in gitlab ci

 script:
   - mkdir -p ${PROJECT_ROOT_DIR}/metro-cache
   - TMPDIR=${PROJECT_ROOT_DIR}/metro-cache react-native bundle --platform android --dev true --entry-file index.js --bundle-output index.android.bundle

  cache:
    key: ${CI_PROJECT_ID}-metro-cache-${CI_COMMIT_REF_NAME}
    paths:
      - ${PROJECT_ROOT_DIR}/metro-cache
hushicai commented 4 years ago

The same issue.