PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.28k stars 5.61k forks source link

win10,c++打包的动态库,启动的时候卡住 #69387

Closed wuliujin closed 1 week ago

wuliujin commented 1 week ago

请提出你的问题 Please ask your question

我是按照官方文档,打包了paddleocr的动态库,exe开启gpu加速是正常的,但是打成动态库之后,只要开启gpu加速就会卡卡住不会往下执行了,这是我的日志,可以帮忙看看是什么吗?感谢

unt,auto_growth_chunk_size_in_mb,benchmark,benchmark_nccl,cache_inference_while_scope,call_stack_level,check_kernel_launch,check_nan_inf,check_nan_inf_level,conv2d_disable_cudnn,conv_workspace_size_limit,convert_all_blocks,cpu_deterministic,cublaslt_exhaustive_search_times,cudnn_batchnorm_spatial_persistent,cudnn_deterministic,cudnn_exhaustive_search,cudnn_exhaustive_search_times,dist_threadpool_size,dygraph_debug,eager_delete_scope,eager_delete_tensor_gb,einsum_opt,embedding_deterministic,enable_adjust_op_order,enable_all2all_use_fp16,enable_api_kernel_fallback,enable_async_trace,enable_auto_detect_gpu_topo,enable_auto_rdma_trans,enable_cublas_tensor_op_math,enable_dependency_builder_debug_info,enable_dump_main_program,enable_exit_when_partial_worker,enable_gpu_memory_usage_log,enable_gpu_memory_usage_log_mb,enable_graph_multi_node_sampling,enable_neighbor_list_use_uva,enable_opt_get_features,enable_pir_api,enable_pir_in_executor,enable_pir_in_executor_trace_run,enable_pir_with_pt_in_dy2st,enable_record_memory,enable_sparse_inner_gather,enable_tracker_all2all,enable_unused_var_check,executor_log_deps_every_microseconds,fast_eager_deletion_mode,fraction_of_cpu_memory_to_use,fraction_of_cuda_pinned_memory_to_use,fraction_of_gpu_memory_to_use,free_idle_chunk,free_when_no_cache_hit,fuse_parameter_groups_size,fuse_parameter_memory_size,gemm_use_half_precision_compute_type,get_host_by_name_time,gpu_allocator_retry_time,gpu_memory_limit_mb,gpugraph_debug_gpu_memory,gpugraph_dedup_pull_push_mode,gpugraph_enable_gpu_direct_access,gpugraph_enable_hbm_table_collision_stat,gpugraph_enable_print_op_debug,gpugraph_enable_segment_merge_grads,gpugraph_force_device_batch_num_equal,gpugraph_hbm_table_load_factor,gpugraph_load_node_list_into_hbm,gpugraph_merge_grads_segment_size,gpugraph_offload_gather_copy_maxsize,gpugraph_offload_param_extends,gpugraph_offload_param_stat,gpugraph_parallel_copyer_split_maxsize,gpugraph_parallel_stream_num,gpugraph_slot_feasign_max_num,gpugraph_sparse_table_storage_mode,gpugraph_storage_mode,graph_embedding_split_infer_mode,graph_get_neighbor_id,graph_load_in_parallel,graph_metapath_split_opt,graph_neighbor_size_percent,host_trace_level,init_allocated_mem,initial_cpu_memory_in_mb,initial_gpu_memory_in_mb,inner_op_parallelism,ir_inplace_kernel_blacklist,jit_engine_type,local_exe_sub_scope_limit,log_memory_stats,low_precision_op_list,max_inplace_grad_add,memory_fraction_of_eager_deletion,multi_node_sample_use_gpu_table,multiple_of_cupti_buffer_size,nccl_blocking_wait,new_executor_sequential_run,new_executor_serial_run,new_executor_static_build,new_executor_use_cuda_graph,new_executor_use_inplace,new_executor_use_local_scope,npu_storage_format,paddle_num_threads,pe_profile_fname,pir_apply_inplace_pass,pir_subgraph_saving_dir,print_allocator_trace_info,print_ir,print_sub_graph_dir,query_dest_rank_by_multi_node,reader_queue_speed_test_mode,reallocate_gpu_memory_in_mb,run_kp_kernel,search_cache_max_number,selected_gpus,set_to_1d,sort_sum_gradient,sync_after_alloc,sync_nccl_allreduce,tensor_operants_mode,tracer_mkldnn_ops_off,tracer_mkldnn_ops_on,trt_ibuilder_cache,use_auto_growth_pinned_allocator,use_autotune,use_cuda_managed_memory,use_fast_math,use_mkldnn,use_pinned_memory,use_shm_cache,use_stream_safe_cuda_allocator,use_stride_kernel,use_system_allocator,use_virtual_memory_auto_growth I1114 10:53:17.097263 57144 init.cc:105] After Parse: argc is 2 3Using GPU: 4Using GPU: 33Using GPU: I1114 10:53:17.098263 57144 analysis_config.cc:1544] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape. 44Using GPU: 55Using GPU: 666Using GPU: 776Using GPU: 88Using GPU: 99Using GPU: 1000Using GPU: I1114 10:53:17.124262 57144 cuda_info.cc:257] SetDeviceId 0 2

wuliujin commented 1 week ago

这是我的版本信息 Paddle version: 2.6.2 GIT COMMIT ID: 8ce0de584c570589117e403322f3d1a0de6554e5 WITH_MKL: ON WITH_MKLDNN: ON WITH_GPU: ON WITH_ROCM: OFF WITH_IPU: OFF CUDA version: 11.8 CUDNN version: v8.6 CXX compiler version: 19.29.30145.0 WITH_TENSORRT: ON TensorRT version: v8.5.1.7 vs:vs2022

wuliujin commented 1 week ago

在全局定义PPOCR 变量 就会这样

zhwesky2010 commented 1 week ago

@wuliujin 是不是电脑内存不足等问题,目前还遇到了问题吗