ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
683 stars 93 forks source link

2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue #2289

Open paolodalberto opened 9 months ago

paolodalberto commented 9 months ago

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

v2.13.0-4108-g619eb25934e 2.13.0

Custom code

No

OS platform and distribution

Linux xsjfislx32 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Mobile device

No response

Python version

Python 3.9.18

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

This is the smallest piece of code from a tutorial that reproduce my problem.

root@xsjfislx32:/dockerx# python
Python 3.9.18 (main, Aug 25 2023, 13:20:04) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-10-30 23:25:50.998575: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> gpus = tf.config.list_physical_devices('GPU')
>>> if gpus:
...     print(len(gpus), "Physical GPUs")
...     try:
...         # Currently, memory growth needs to be the same across GPUs
...         for gpu in gpus:
...             print(gpu)
...             tf.config.experimental.set_memory_growth(gpu, True)
...         logical_gpus = tf.config.list_logical_devices('GPU')
...         print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
...     except RuntimeError as e:
...         # Memory growth must be set before GPUs have been initialized
...         print(e)
... 
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2023-10-30 23:26:05.514405: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
>>> type(gpus[0])
<class 'tensorflow.python.eager.context.PhysicalDevice'>
>>>  logical_gpus = tf.config.list_logical_devices('GPU')
  File "<stdin>", line 1
    logical_gpus = tf.config.list_logical_devices('GPU')
IndentationError: unexpected indent
>>> logical_gpus = tf.config.list_logical_devices('GPU')
2023-10-30 23:30:11.398855: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1284] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 480, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1666, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 596, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0

Standalone code to reproduce the issue

The main problem comes from reading training data (using multiple GPUs) and at first I thought was the batch size: 

print("reading training set",data_dir+"/train/")
        train_ds = tf.keras.preprocessing.image_dataset_from_directory(
            data_dir+"/train/", 
            #subset="training",
            seed = 123,
            label_mode = 'int',
            image_size=(x, y),
            #batch_size=128
            batch_size= 16
        )

Relevant log output

hipconfig
HIP version  : 5.6.31061-8c743ae5d

== hipconfig
HIP_PATH     : /scratch/rocm-5.6.0
ROCM_PATH    : /scratch/rocm-5.6.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/scratch/rocm-5.6.0/include -I/scratch/rocm-5.6.0/llvm/bin/../lib/clang/16.0.0 

== hip-clang
HIP_CLANG_PATH   : /scratch/rocm-5.6.0/llvm/bin
AMD clang version 16.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.6.0 23243 be997b2f3651a41597d7a41441fff8ade4ac59ac)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratch/rocm-5.6.0/llvm/bin
AMD LLVM version 16.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver2

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -isystem "/scratch/rocm-5.6.0/include" -O3
hip-clang-ldflags  :  -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc

=== Environment Variables
PATH=/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/condabin:/wrk/hdstaff/paolod/perforce/RDI_paolod_Dev_work/temp/anaconda2/bin:/home/paolod/bin:/usr/local/bin:/mis/TREE/bin:/usr/bin:/bin:/usr/ucb
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib

== Linux Kernel
Hostname     : xsjfislx31
Linux xsjfislx31 5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:    core-11.1.0ubuntu4-noarch:printing-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:    22.04
Codename:   jammy

root@xsjfislx31:/root# rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          NO

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    263707140(0xfb7da04) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    AMD EPYC 7F52 16-Core Processor    
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7F52 16-Core Processor    
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    264225344(0xfbfc240) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-20b160b85ec60c80               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   9984                               
  Internal Node ID:        2                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 4                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-973b4b0056b6285e               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    3                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   33536                              
  Internal Node ID:        3                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 5                  
*******                  
  Name:                    gfx908                             
  Uuid:                    GPU-16bf154e2fe0adac               
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    4                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 29580(0x738c)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1502                               
  BDFID:                   58368                              
  Internal Node ID:        4                                  
  Compute Unit:            120                                
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 60                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    33538048(0x1ffc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***
paolodalberto commented 9 months ago

feel free to reach me directly/internally ... thank you Paolo

dipietrantonio commented 9 months ago

I observed the same behaviour and thought of an incompatibility between ROCm 5.6 and TF 2.13. But that was just a wild guess.

paolodalberto commented 8 months ago

My home set up with the new tensoflow:latest docker does the same (different GPUs Radeon VII). this is a show stopper ... any attention will be appreciated !

paolodalberto commented 8 months ago
ls /etc/alternatives/roc -lrt 
roc-obj                roc-obj-ls             rocm/                  rocm_agent_enumerator  rocprof
roc-obj-extract        rocgdb                 rocm-smi               rocminfo               rocprofv2
:/root# ls /etc/alternatives/rocm -lrt 
lrwxrwxrwx 1 root root 15 Sep 16 23:54 /etc/alternatives/rocm -> /opt/rocm-5.7.0
paolodalberto commented 8 months ago
drwxr-xr-x 1 root root 4096 Sep 16 23:54 rocm-5.7.0
lrwxrwxrwx 1 root root   22 Sep 16 23:54 rocm -> /etc/alternatives/rocm

r

paolodalberto commented 8 months ago

the rocm version default seems to be 5.7 but hip is 5.6 ?

paolodalberto commented 8 months ago

Any takers for this Issue ?

paolodalberto commented 8 months ago

Is there any one ?

paolodalberto commented 8 months ago

echo ... echo ... echo

paolodalberto commented 8 months ago

shoot an email paolod AT amd.com

gzitzlsb-it4i commented 8 months ago

Same here. Is there any update?

paolodalberto commented 8 months ago

@gzitzlsb-it4i no updates on my side

paolodalberto commented 8 months ago

keeping the comments alive ...

gzitzlsb-it4i commented 8 months ago

I see this issue with both rocm5.7-tf2.12-dev and rocm5.7-tf2.13-dev. Reverted now to rocm5.6-tf2.12-dev, which works well.

Maybe this is related to the change of rom 5.6->5.7?

paolodalberto commented 8 months ago

Thanksgiving ... take your time. @gzitzlsb-it4i , I tested it from a docker image ... tensorflow:latest should it be addressed there ? Who knows ... one day

jpata commented 8 months ago

I'm observing the same problem with rocm 5.7 and both tf 2.12 and tf 2.13. It does not appear with rocm 5.6 and tf 2.12.

paolodalberto commented 8 months ago

Anyone can redirect me to a person I can talk to ?

paolodalberto commented 7 months ago

I guess we will wait for rocm 6

paolodalberto commented 7 months ago

I tried to pull again, there is no new version is there any thing I can do ?

paolodalberto commented 7 months ago

is there a docker tensorflow for rocm 6 ? I removed and pulled it again and it is still 5.7

paolodalberto commented 7 months ago

Keep this alive because the last pull did not fix this thank you and Happy Holidays !

paolodalberto commented 6 months ago

any update ?

paolodalberto commented 6 months ago
REPOSITORY                         TAG       IMAGE ID       CREATED        SIZE
rocm/tensorflow                    latest    a169c415feb2   2 weeks ago    37.2GB
<none>                             <none>    36781c65cb73   2 months ago   45.5GB
containers.xilinx.com/acdc/build   2.0       b66986b55092   2 months ago   6.71GB
rocm/tensorflow                    <none>    0db6c42705bf   3 months ago   31.9GB
rocm/pytorch                       latest    1cd3cad3f90f   3 months ago   52.1GB
paolodalberto commented 6 months ago
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-01-09 00:06:37.372844: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
paolodalberto commented 6 months ago
(Pdb) l
215         try:
216             # Currently, memory growth needs to be the same across GPUs
217             for gpu in gpus:
218                 print(gpu)
219                 tf.config.experimental.set_memory_growth(gpu, True)
220  ->         logical_gpus = tf.config.list_logical_devices('GPU')
221             print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
222         except RuntimeError as e:
223             # Memory growth must be set before GPUs have been initialized
224             print(e)
225     
(Pdb) n
2024-01-09 00:09:06.679489: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
> /dockerx/test_user.py(220)<module>()
paolodalberto commented 6 months ago

I thought the latest drop would address this .... but how can you address it if you do not acknowledge ... the suspense.

jpata commented 6 months ago

I also confirm that ROCM 6.0 and tensorflow 2.14 still do not work on MI250X, the same error pops up:

2024-01-10 20:25:42.726550: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue

ROCM+tensorflow is becoming badly out of date and unusable on large HPC systems that made the mistake of buying AMD MI250X.

paolodalberto commented 6 months ago

Someday I'll wish upon a star Wake up where the clouds are far behind me Where trouble melts like lemon drops

dipietrantonio commented 6 months ago

@paolodalberto @jpata what AMDGPU driver version are you trying to run the container on? On our HPC system we have a rather old one, Driver version: 5.16.9.22.20 due to an outdated ROCm 5.2.3 version present in the Cray environment. @jpata I assume you use LUMI, which should have a similar issue.

I believe no matter the container version you use, the issue is the driver on the host system.

jpata commented 6 months ago

@dipietrantonio excellent point, thanks a lot! I confirm that LUMI HPC where I'm experiencing this issue uses 5.16.9.22.20.

paolodalberto commented 5 months ago

I used my home system (VEGA VII with upgraded ubuntu) and two more advanced ones with MI100 and upgraded recently. Pythorch works

paolodalberto commented 5 months ago
tf-docker / > bash /dockerx/test.sh 
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
> /dockerx/test_user.py(212)<module>()
-> gpus = tf.config.list_physical_devices('GPU')
(Pdb) c
3 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')
PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')
2024-02-06 22:30:38.546437: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:1294] failed to query device memory info: HIP_ERROR_InvalidValue
Traceback (most recent call last):
  File "/dockerx/test_user.py", line 212, in <module>
    gpus = tf.config.list_physical_devices('GPU')
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/config.py", line 491, in list_logical_devices
    return context.context().list_logical_devices(device_type=device_type)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 1688, in list_logical_devices
    self.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 598, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
dipietrantonio commented 5 months ago

Dear @paolodalberto @jpata ,

We have installed a newer version of the ROCm driver (6.0.5) on a bunch of nodes for testing and now my container with ROCm 5.7 and TF 2.13 works on the code posted in the description of this issue. The error is gone :) So it is a driver issue as I expected.

$ export CIMAGE=$MYSOFTWARE/tensorflow-2.23-rocm5.7.sif
$ singularity exec $CIMAGE python3 tf_test.py 
2024-02-07 14:35:41.241861: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
1 Physical GPUs
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
2024-02-07 14:35:46.934224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 63938 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
1 Physical GPUs, 1 Logical GPUs
$ cat tf_test.py 
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(len(gpus), "Physical GPUs")
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            print(gpu)
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)
paolodalberto commented 5 months ago

@dipietrantonio excuse me for my thickness. The driver does not come with the docker? You are saying that rocm driver 5.7 is the problem ...

dipietrantonio commented 5 months ago

When you run a container you rely on the host kernel, not the one installed in your container. The driver is a kernel module. You need to update the driver on the system you are running the container on (at least when you use the Singularity container engine, but I think it is the same for Docker).

The problem for me was that the driver of ROCm 5.2 was the issue. I was not expecting that even the ROCm 5.7 driver could have this issue. But as I said, the driver version 6.0.5 solved my issue.

paolodalberto commented 5 months ago

hmm ...

paolodalberto commented 4 months ago

Still no new docker with rocm 6

paolodalberto commented 2 months ago

new docker arrived

usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.9/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 105, in normalize_element
    spec = type_spec_from_value(t, use_fallback=False)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 514, in type_spec_from_value
    raise TypeError("Could not build a `TypeSpec` for {} with type {}".format(
TypeError: Could not build a `TypeSpec` for ['/imagenet/train/n02102177/n02102177_9088.JPEG', '/imagenet/train/n01796340/n01796340_3887.JPEG', '/imagenet/train/n02363005/n02363005_6465.JPEG', '/imagenet/train/n02965783/n02965783_1876.JPEG', '/imagenet/train/n01734418/n01734\
418_12680.JPEG', '/imagenet/train/n02422699/n02422699_28690.JPEG',
paolodalberto commented 2 months ago
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dockerx/test_user.py", line 268, in <module>
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 308, in image_dataset_from_directory
    dataset = paths_and_labels_to_dataset(
  File "/usr/local/lib/python3.9/dist-packages/keras/src/utils/image_dataset.py", line 350, in paths_and_labels_to_dataset
    path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 825, in from_tensor_slices
    return from_tensor_slices_op._from_tensor_slices(tensors, name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 25, in _from_tensor_slices
    return _TensorSliceDataset(tensors, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 33, in __init__
    element = structure.normalize_element(element)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/data/util/structure.py", line 110, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i))
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/ops.py", line 696, in convert_to_tensor
    return tensor_conversion_registry.convert(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 234, in convert
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 335, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/weak_tensor_ops.py", line 142, in wrapper
    return op(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 271, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 296, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/constant_op.py", line 102, in convert_to_eager_tensor
    ctx.ensure_initialized()
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/context.py", line 603, in ensure_initialized
    context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.UnknownError: Failed to query available memory for GPU 0
paolodalberto commented 2 months ago

this is with my system at home and I will check on monday on the real machine

paolodalberto commented 2 months ago

https://github.com/ROCm/tensorflow-upstream/issues/2289#issuecomment-1931424826 how do you upgrade the driver ?

paolodalberto commented 2 months ago

I could 1978 sudo apt update 1979 wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb 1980 sudo apt install ./amdgpu-install_6.1.60100-1_all.deb 1981 sudo amdgpu-install --list-usecase 1982 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,workstation,asan 1983 sudo amdgpu-install --usecase=dkms,rocm,graphics,hiplibsdk,hip 1984 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,opencl,graphics,hiplibsdk,hip 1985 sudo amdgpu-install --usecase=dkms 1986 sudo amdgpu-install --usecase=dkms,rocm,rocmdev,rocmdevtools 1987 sudo amdgpu-install --usecase=dkms,rocm 1988 sudo amdgpu-install --usecase=dkms,rocmdev, rocm 1989 sudo amdgpu-install --usecase=dkms 1990 sudo reboot

At least it works for one GPU

paolodalberto commented 2 months ago

Let me check what I can do on my large machine ...

paolodalberto commented 2 months ago

the large machine now kicks me out during evaluation but I can see briefly the GPUs

paolodalberto commented 2 months ago

yep multiple GPUs do not work (single GPU works)

paolodalberto commented 2 months ago

In practice the multiple GPUs fails so badly that the docker application stalls the machine and breaks the docker deamon that I have to restart manually. This is on a system above ... The funny part this was working on 5.7, 6 months ago .... for tensor flow and pytorch ...
let me know if you like to connect ...

paolodalberto commented 2 months ago

image Good times