CesiumGS / cesium

An open-source JavaScript library for world-class 3D globes and maps :earth_americas:
https://cesium.com/cesiumjs/
Apache License 2.0
12.94k stars 3.49k forks source link

Erratic test failure in master #7249

Closed mramato closed 2 years ago

mramato commented 6 years ago

The last master build failed with the below error. If I had to guss, this test is using an external server. We should probably bring whatever data its using in-house to make sure CI doesn't randomly fail.

CC @likangning93 @OmarShehata

  Core/GoogleEarthEnterpriseTerrainData
    upsample
      ✗ works for all four children of a simple quad
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5096)
    <Jasmine>
5.002 secs: Core/GoogleEarthEnterpriseTerrainData upsample works for all four children of a simple quad
3.236 secs: Core/GoogleEarthEnterpriseTerrainData createMesh creates specified vertices plus skirt vertices
1.667 secs: Core/GoogleEarthEnterpriseMetadata resolves readyPromise
1.187 secs: Core/VideoSynchronizer Syncs time when not looping
1.151 secs: Core/ApproximateTerrainHeights initializes
1.132 secs: Scene/GroundPrimitive does not pick when allowPicking is false
1.109 secs: Scene/GroundPrimitive larger scene renders batched instances
1.087 secs: Scene/Vector3DTileContent renders polygons
1.069 secs: DataSources/GeometryVisualizer Creates and removes static color open geometry
1.053 secs: Scene/GroundPrimitive picking in 2D
Firefox 63.0.0 (Linux 0.0.0): Executed 10347 of 10355 (1 FAILED) (skipped 8) (2 mins 22.893 secs / 2 mins 13.272 secs)
TOTAL: 1 FAILED, 10346 SUCCESS
likangning93 commented 6 years ago

I think this one might just be slow, it passes in airplane mode on my Macbook Pro but times out with-or-without a network connection on my N2920-based netbook. [EDIT] After a cursory glance at the spec, it also doesn't look like it's requesting anything.

OmarShehata commented 5 years ago

I just had a build fail on Travis I believe for a similar reason as above:

    Scene/PointCloudEyeDomeLighting
    ✗ adds a clear command and a post-processing draw call
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ does not change commands for pick calls
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
  Scene/Tileset3DTileContent
    ✗ resolves readyPromise
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ destroys
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ gets properties
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
  Scene/Batched3DModel3DTileContentClassification
    ✗ renders with batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with binary batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
  Scene/Batched3DModel3DTileContent
    ✗ recognizes the legacy 20-byte header
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ recognizes the legacy 24-byte header
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ logs deprecation warning for use of BATCHID without prefixed underscore
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ resolves readyPromise
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with batch table binary
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders without batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with all features translucent
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with a mix of opaque and translucent features
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with textures
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with a tile transform and box bounding volume
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with a tile transform and sphere bounding volume
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with a tile transform and region bounding volume
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ picks with batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ picks without batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ can get features and properties
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ throws when calling getFeature with invalid index
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ gets memory usage
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ Links model to tileset clipping planes based on bounding volume clipping
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ Links model to tileset clipping planes if tileset clipping planes are reassigned
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ rebuilds Model shaders when clipping planes change
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ transforms model positions by RTC_CENTER property in the features table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ destroys
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
  Scene/Cesium3DTileBatchTable
    ✗ renders tileset with batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset without batch table
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders when vertex texture fetch is not supported
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with featuresLength greater than maximumTextureSize
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders with featuresLength of zero
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders translucent style
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders translucent style when vertex texture fetch is not supported
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset with batch table hierarchy extension
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset with batch table hierarchy using binary properties
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset with batch table hierarchy with multiple parent classes
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset with batch table hierarchy with no parents
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ renders tileset with legacy batch table hierarchy (pre-version 1.0)
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ warns about deprecated batch hierarchy (pre-version 1.0)
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>
    ✗ destroys
    Error: Timeout - Async callback was not invoked within 5000ms (set by jasmine.DEFAULT_TIMEOUT_INTERVAL) in node_modules/jasmine-core/lib/jasmine-core/jasmine.js (line 5451)
    <Jasmine>

Restarting the build fixed it.

mramato commented 5 years ago

Master just failed again https://travis-ci.org/AnalyticalGraphicsInc/cesium/builds/474883290?utm_source=github_status&utm_medium=notification We need to nip this in the bud.

@OmarShehata since you've been doing some test clean-up/optimization, do you think you can take a look at this? We worked hard to get CI consistently passing and now all of a sudden it's failing every dozen builds.

OmarShehata commented 5 years ago

@mramato yeah I'll take a look at this today or tomorrow.

OmarShehata commented 5 years ago

I wonder/hope there's some underlying issue here. It seems a lot less table than it was a week ago. This just failed on Travis:

Scene/raises the cameraMoveEvent event when the camera stops moving

But it wasn't a timeout (expected 0 to 1).

mramato commented 5 years ago

@OmarShehata any update here?

OmarShehata commented 5 years ago

Sorry, this got pushed down my to do list with all the recent shuffles. I'm running some Travis builds now to see how consistent the failing tests are. If there are any recent failed builds like that whose logs still exist please post them here.

OmarShehata commented 5 years ago

I'm able to replicate a subset of these failures by running them on a slower machine (and they pass fine when re-running the tests). I don't see an easy way around this other than refactoring each one of those tests to make it faster. Although it's still unclear why:

Cranking the default timeout up to 20 is a band-aid solution that will make these timing spikes not stop the build until we refactor them. I can open a PR here and re-run the build a bunch of times to test that new timeout. Running it locally I haven't seen anything above ~10 seconds.

mramato commented 5 years ago

Well, for VideoSynchronizer, network latency may be at play (those tests should probably use a local, in-repository video instead of an externally hosted one). Please write up an issue.

That doesn't explain the other tests if they are not using external resources.

Do the other tests ever vary wildly when run in isolation or only when all ran together?

OmarShehata commented 5 years ago

I'm running more tests to answer that, but just documenting a peculiar finding: GroundPrimitive/does not render whens how if false, just failed at the first expect:

expect(frameState.afterRender.length).toEqual(1);

The value was 2 instead of 1. This test does not seem to be relying on any external resources, and it was not a timeout, so just increasing the timeout wouldn't fix this intermittent failure. It failed when running all the tests together, so far hasn't failed when running in isolation.

OmarShehata commented 5 years ago

The most common failure seems to be this set of tests https://github.com/AnalyticalGraphicsInc/cesium/issues/7249#issuecomment-450908174, and what's common there is that they all use Cesium3DTilesTester.loadTileset.

My best guess now is that something goes wrong with the scene object they all use, causing the tileset to never load and timeouts to occur. They can all be traced to:

Cesium3DTilesTester.waitForTilesLoaded = function(scene, tileset) {
        return pollToPromise(function() {
            scene.renderForSpecs();
            return tileset.tilesLoaded;
        }).then(function() {
            scene.renderForSpecs();
            return tileset;
        });
    };

If tileset.tilesLoaded is never true, all those failures will occur.

OmarShehata commented 5 years ago

It took 2 re-builds for this to pass on Travis https://github.com/AnalyticalGraphicsInc/cesium/pull/7476

The first time, this failed https://github.com/AnalyticalGraphicsInc/cesium/issues/7249#issuecomment-451287166 and the second time, this did https://github.com/AnalyticalGraphicsInc/cesium/issues/7249#issuecomment-452461921.

Both of those are non-timeout tests. The only thing in common amongst all these failures is they rely on something happening in scene.render(). If this failing for any reason, it will cause those failures as well as the timeouts. I'm wondering if this has to do with our recent switch to FirefoxHeadless.

Each Spec file creates a scene, which creates a new canvas, but it does get cleaned up, and it should be running even on failure.

mramato commented 5 years ago

I'm wondering if this has to do with our recent switch to FirefoxHeadless.

We were already using FirefoxHeadless for built tests before this happened, so I doubt that's it.

mramato commented 5 years ago

@OmarShehata I think you are on to something and it would make sense that the delay is really the test waiting for something that never happens. Any idea why this is happening? CC @lilleyse

OmarShehata commented 5 years ago

I've been seeing this one fail a lot:

SceneSpec/raises the cameraMoveEvent event when the camera stops moving

Which is very interesting because the two camera events tests, right before and right after it, are very stable. @lilleyse suggested refactoring this test (for example, it calls render twice, perhaps because the event doesn't always get triggered with a single render, which seems quite unreliable). That's something I'll try next that'll hopefully get a good chunk of the failures to disappear.

mramato commented 5 years ago

Pretty sure #7482 fixed most of the stability issues except for the cameraMoveEvent that @OmarShehata mentions above. @OmarShehata have you seen crashes since #7482 was merged? (I forced master a few times and didn't). I have seen the cameraMoveEvent one though, so it might good to fix or ignore that one ASAP. (if we ignore it, write up a new clean issue so we don't forget about it)

OmarShehata commented 5 years ago

@mramato I haven't seen any other test failures yet since. I'll refactor the camera test this week and that'll hopefully curb these failures. The only thing that will remain is tests using external resources (described here https://github.com/AnalyticalGraphicsInc/cesium/issues/7244) but those don't seem to be a critical issue on Travis currently.

OmarShehata commented 5 years ago

Just saw a failure today with this same sequence: https://github.com/AnalyticalGraphicsInc/cesium/issues/7249#issuecomment-450908174

hpinkos commented 5 years ago

This seems to be getting worse. I'm seeing travis fail with random tests more often than I see it pass.

mramato commented 5 years ago

I think we need to experiment with bumping the test timeout to a large value (like 30 seconds) when running on travis and see if that helps improve things. travis' infrastructure seems to have gotten work and their servers are getting slower and slower.

OmarShehata commented 5 years ago

There's a new erratic failure discovered in https://github.com/AnalyticalGraphicsInc/cesium/pull/7713

Core/Resource fetchImage without ImageBitmap can load an image fails erratically when the loaded image has a width/height of 16 instead of 1. The image request isn't failing, it's almost as if a request from the other tests is interfering with it? (Many of which are 16x16). I haven't been able to recreate it locally or in after various Travis rebuilds.

The exact failures are:

Core/Resource
    ✗ can load an image preferring blob
    Expected 16 to equal 1.
    ✗ can load an image from a data URI
    Expected 16 to equal 1.
    fetchImage without ImageBitmap
      ✗ can load an image
    Expected 16 to equal 1.
OmarShehata commented 5 years ago

I'm starting to get the failures above locally (but only when running all the tests) which is good news, I might be able to look into and get a fix for this now.

lilleyse commented 5 years ago

I've been seeing a new error lately: changing the camera frustum does not cause continuous rendering in requestRenderMode

Once in https://github.com/AnalyticalGraphicsInc/cesium/pull/8242 and again in https://github.com/AnalyticalGraphicsInc/cesium/pull/8205. Restarting the build usually makes CI pass again.

Build details: https://travis-ci.org/AnalyticalGraphicsInc/cesium/builds/592619357?utm_source=github_status&utm_medium=notification

mramato commented 5 years ago

After some CI fixes and the ES6 migration, the only consistent CI failure I'm seeing is with Vector3DTileContent These tests look overly complicated and have a lot of setup, so I'm leaning towards the specs being the problem: https://travis-ci.org/AnalyticalGraphicsInc/cesium/builds/602749111

mramato commented 5 years ago

Ignored Vector3DTileContent specs for now in #8317 until we have time to look at them.

sanjeetsuhag commented 2 years ago

As of https://github.com/CesiumGS/cesium/commit/fd66f69dc1bccc808385874590cb7b483a11bd4e, there's a bunch of test failures in the Vector3DTileContent suite that I can reproduce locally:

image

CI seems to have passed though.

Moreover, it seems to me that each spec is trying to do too much. We're testing for render as well as pick everytime, and there's a bunch of nested verify style functions that are invoked in each spec. I'm not opposed to the robustness of this is striving for, but I'm leaning towards re-writing the entire suite with smaller, more focused tests.

ggetz commented 2 years ago

CI seems to have passed though.

Do you see the same when running with --webgl-stub locally?

I'm not opposed to the robustness of this is striving for, but I'm leaning towards re-writing the entire suite with smaller, more focused tests.

This sounds like the best approach to me. Specs should only test one thing, and re-writing the specs seems to be the consensus in thread anyway.