facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.
Other
3.18k stars 280 forks source link

Flaky Tests in CircleCI #908

Open anupambhatnagar opened 2 years ago

anupambhatnagar commented 2 years ago

Here is a list of flaky tests that we should fix in our next fix-a-thon.

min-xu-ai commented 2 years ago

also layertracker test seems to be flaky

tests.experimental.tooling.test_layer_memory_tracker

>       assert summary.total_forward_allocations >= summary.total_activation_allocations
E       assert 77056000 >= 77070864
E        +  where 77056000 = LayerwiseMemoryTrackerSummary(max_memory_allocated=104022528, max_memory_cached=134217728, total_activation_allocation... is_forward=True, all_gathered=0, cumul_all_gathered=0, event=TraceForwardEvent(memory_diff=0, memory_activations=0))]).total_forward_allocations
E        +  and   77070864 = LayerwiseMemoryTrackerSummary(max_memory_allocated=104022528, max_memory_cached=134217728, total_activation_allocation... is_forward=True, all_gathered=0, cumul_all_gathered=0, event=TraceForwardEvent(memory_diff=0, memory_activations=0))]).total_activation_allocations

tests/experimental/tooling/test_layer_memory_tracker.py:65: AssertionError
anj-s commented 2 years ago

Another flaky test for the list: tests/nn/data_parallel/test_fsdp.py: TestSerialization

min-xu-ai commented 2 years ago

Remember to address todo in https://github.com/facebookresearch/fairscale/pull/933/files/86d45de0e14c8273916c3c68db8c297bc3bb59a8#diff-bc534f971a86e11cc30be248c89989c8024c941a855d4716025da461fcd29047

thanks to Anjali for suggesting recording this.

min-xu-ai commented 2 years ago

Another one: https://app.circleci.com/pipelines/github/facebookresearch/fairscale/4277/workflows/73e769c1-510d-4c83-bb25-0eae414f7897/jobs/47992

I will disable it.