AcademySoftwareFoundation / OpenShadingLanguage

Advanced shading language for production GI renderers
BSD 3-Clause "New" or "Revised" License
2.12k stars 366 forks source link

Untracked memory in OSL #658

Open sfriedmapixar opened 8 years ago

sfriedmapixar commented 8 years ago

Using valgrind's massif profiling tool, I've found several large blocks of memory that aren't accounted for, where the unaccounted memory is dwarfing the accounted. The biggest is that the ShadingContext memory isn't accounted for, including all the constant pools. The vectors that point to the values for parameters and such are accounted in things like paramvals, but the actual values in the pools that those point to aren't accounted. The shader objects and their execution space also aren't accounted for. The final big source of missing memory is stuff that LLVM is using, but I can understand not counting that as it's much harder.

Here's what OSL reports.


OSL Shading Current Memory    86,549,052       
OSL Shading Peak Memory            98,972,980

osl_mem_master_current            6,469,424       
osl_mem_master_peak               6,469,424       
osl_mem_master_ops_current        3,348,864       
osl_mem_master_ops_peak           3,348,864       
osl_mem_master_args_current         431,864       
osl_mem_master_args_peak            431,864       
osl_mem_master_syms_current       2,660,352       
osl_mem_master_syms_peak          2,660,352       
osl_mem_master_defaults_current      12,372       
osl_mem_master_defaults_peak         12,372       
osl_mem_master_consts_current         6,996       
osl_mem_master_consts_peak            6,996       
osl_mem_inst_current             80,079,628       
osl_mem_inst_peak                92,503,556       
osl_mem_inst_syms_current        57,871,904       
osl_mem_inst_syms_peak           70,255,264       
osl_mem_inst_paramvals_current    6,811,456       
osl_mem_inst_paramvals_peak       6,811,456       
osl_mem_inst_connections_current  2,783,484       
osl_mem_inst_connections_peak     3,484,800

And here is a cleaned up version of the valgrind info for things I'm pretty sure aren't counted anywhere.

---- Shading context memory ----
->24.63% (301,596,672B) 0x871AC55: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
->07.70% (94,248,960B) 0x871ABD8: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->07.70% (94,248,960B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->07.70% (94,248,960B) 0x871ABD8: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->07.70% (94,248,960B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->00.41% (5,043,792B) 0x86C9F68: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)
->00.38% (4,712,448B) 0x871AA1C: OSL::ShadingContext::ShadingContext(OSL::pvt::ShadingSystemImpl&, OSL::PerThreadInfo*)
| ->00.38% (4,712,448B) 0x86C9F7C: OSL::pvt::ShadingSystemImpl::get_context(OSL::PerThreadInfo*, OpenImageIO::v1_5::TextureSystem::Perthread*)

---- actual value pools for strings, floats, ints, etc -----
->01.10% (13,486,024B) 0x7B132E8: __gnu_cxx::new_allocator<OpenImageIO::v1_5::ustring>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->01.10% (13,486,024B) 0x7B14468: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_M_allocate(unsigned long) (stl_vector.h:140)
|   ->00.65% (8,000,000B) 0x87B442A: OSL::pvt::RuntimeOptimizer::add_constant(OSL::pvt::TypeSpec const&, void const*, OpenImageIO::v1_5::TypeDesc)
|   ->00.41% (5,035,640B) 0x7B14188: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_Vector_base(unsigned long, std::allocator<OpenImageIO::v1_5::ustring> const&) (stl_vector.h:113)
|   | ->00.41% (5,035,640B) 0x7B24094: std::_Vector_base<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_Vector_base(unsigned long, std::allocator<OpenImageIO::v1_5::ustring> const&) (stl_vector.h:110)
|   |   ->00.41% (5,035,640B) 0x7B147A8: std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::vector(std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > const&) (stl_vector.h:242)
|   |     ->00.41% (4,985,984B) 0x7B23608: OSL::OSLQuery::Parameter::Parameter(OSL::OSLQuery::Parameter const&) (oslquery.h:61)
|   ->00.04% (450,384B) 0x7B152BD: OpenImageIO::v1_5::ustring* std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > > >(unsigned long, __gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > >, __gnu_cxx::__normal_iterator<OpenImageIO::v1_5::ustring const*, std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > >) (stl_vector.h:963)
|     ->00.04% (450,384B) 0x7B15D28: std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> >::operator=(std::vector<OpenImageIO::v1_5::ustring, std::allocator<OpenImageIO::v1_5::ustring> > const&) (vector.tcc:164)
|       ->00.04% (450,384B) 0x8737FD9: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)

->00.96% (11,764,220B) 0x65FD9EC: __gnu_cxx::new_allocator<float>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->00.96% (11,764,220B) 0x65FE928: std::_Vector_base<float, std::allocator<float> >::_M_allocate(unsigned long) (stl_vector.h:140)
|   ->00.49% (6,023,576B) 0x68B6319: float* std::vector<float, std::allocator<float> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > > >(unsigned long, __gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > >, __gnu_cxx::__normal_iterator<float const*, std::vector<float, std::allocator<float> > >) (stl_vector.h:963)
|   | ->00.49% (6,023,576B) 0x68B6BD4: std::vector<float, std::allocator<float> >::operator=(std::vector<float, std::allocator<float> > const&) (vector.tcc:164)
|   |   ->00.49% (6,023,576B) 0x8737FBB: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)
|   ->00.33% (4,000,000B) 0x68B62A1: float* std::vector<float, std::allocator<float> >::_M_allocate_and_copy<std::move_iterator<float*> >(unsigned long, std::move_iterator<float*>, std::move_iterator<float*>) (stl_vector.h:963)
|   | ->00.33% (4,000,000B) 0x68B658A: std::vector<float, std::allocator<float> >::reserve(unsigned long) (vector.tcc:72)
|   |   ->00.33% (4,000,000B) 0x87C0D25: OSL::pvt::ShadingSystemImpl::alloc_float_constants(unsigned long)
|   ->00.14% (1,705,484B) 0x68B5EE4: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (stl_vector.h:113)
|   | ->00.14% (1,705,484B) 0x68B9788: std::_Vector_base<float, std::allocator<float> >::_Vector_base(unsigned long, std::allocator<float> const&) (stl_vector.h:110)
|   |   ->00.14% (1,705,484B) 0x6CA5504: std::vector<float, std::allocator<float> >::vector(std::vector<float, std::allocator<float> > const&) (stl_vector.h:242)
|   |     ->00.14% (1,705,484B) 0x7B235DF: OSL::OSLQuery::Parameter::Parameter(OSL::OSLQuery::Parameter const&) (oslquery.h:61)

| ->00.44% (5,404,828B) 0x65597EC: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (stl_vector.h:140)
| | ->00.33% (4,000,000B) 0x66A0411: int* std::vector<int, std::allocator<int> >::_M_allocate_and_copy<int*>(unsigned long, int*, int*) (stl_vector.h:963)
| | | ->00.33% (4,000,000B) 0x66A19D3: std::vector<int, std::allocator<int> >::reserve(unsigned long) (vector.tcc:72)
| | |   ->00.33% (4,000,000B) 0x87B3D7F: OSL::pvt::RuntimeOptimizer::add_constant(OSL::pvt::TypeSpec const&, void const*, OpenImageIO::v1_5::TypeDesc)
| | ->00.07% (874,648B) 0x665847C: std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&) (vector.tcc:322)
| | | ->00.07% (874,608B) 0x877B2C7: OSL::pvt::OSOReaderToMaster::instruction_arg(char const*)
| | ->00.04% (455,452B) 0x6657455: int* std::vector<int, std::allocator<int> >::_M_allocate_and_copy<__gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > > >(unsigned long, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >) (stl_vector.h:963)
| | | ->00.04% (455,452B) 0x6657B50: std::vector<int, std::allocator<int> >::operator=(std::vector<int, std::allocator<int> > const&) (vector.tcc:164)
| | |   ->00.04% (455,452B) 0x8737F9D: OSL::pvt::ShaderInstance::parameters(OpenImageIO::v1_5::ParamValueList const&)

-- the actual shader objects ---
->01.05% (12,800,024B) 0x86D335C: OSL::pvt::ShadingSystemImpl::Shader(OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view)
| ->01.05% (12,800,024B) 0x86D38E9: OSL::ShadingSystem::Shader(OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view, OpenImageIO::v1_5::string_view)

--- execution scratch? ---
->00.43% (5,219,886B) 0x6CDA194: __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (new_allocator.h:89)
| ->00.43% (5,219,886B) 0x7870E74: std::_Vector_base<char, std::allocator<char> >::_M_allocate(unsigned long) (stl_vector.h:140)
|   ->00.43% (5,219,886B) 0x78735FB: std::vector<char, std::allocator<char> >::_M_fill_insert(__gnu_cxx::__normal_iterator<char*, std::vector<char, std::allocator<char> > >, unsigned long, char const&) (vector.tcc:414)
|     ->00.43% (5,219,044B) 0x871BC68: OSL::ShadingContext::execute(OSL::ShaderGroup&, OSL::ShaderGlobals&, bool)

----- LLVM ----
->01.71% (20,971,520B) 0x912397B: llvm::ValueHandleBase::AddToUseList()
| ->01.03% (12,582,912B) 0x8DAF743: llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >::operator[](llvm::Value const* const&)
| | ->00.69% (8,388,608B) 0x8E8E488: (anonymous namespace)::PruningFunctionCloner::CloneBlock(llvm::BasicBlock const*, std::vector<llvm::BasicBlock const*, std::allocator<llvm::BasicBlock const*> >&)
| | ->00.34% (4,194,304B) 0x8EE83C1: llvm::MapValue(llvm::Value const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| | | ->00.34% (4,194,304B) 0x8EE81C6: llvm::MapValue(llvm::Value const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| | |   ->00.34% (4,194,304B) 0x8EE8517: llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*)
| ->00.34% (4,194,304B) 0x8E8E4DC: (anonymous namespace)::PruningFunctionCloner::CloneBlock(llvm::BasicBlock const*, std::vector<llvm::BasicBlock const*, std::allocator<llvm::BasicBlock const*> >&)
| | ->00.34% (4,194,304B) 0x8E8EF79: llvm::CloneAndPruneFunctionInto(llvm::Function*, llvm::Function const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*> >&, bool, llvm::SmallVectorImpl<llvm::ReturnInst*>&, char const*, llvm::ClonedCodeInfo*, llvm::DataLayout const*, llvm::Instruction*)
| ->00.34% (4,194,304B) 0x8B42E96: llvm::BitcodeReaderValueList::push_back(llvm::Value*)
| | ->00.34% (4,194,304B) 0x8B3F5F3: llvm::BitcodeReader::ParseModule(bool)

->00.61% (7,425,672B) 0x9121D11: llvm::User::operator new(unsigned long, unsigned int)
| ->00.35% (4,253,040B) 0x9090BB6: llvm::ConstantInt::get(llvm::LLVMContext&, llvm::APInt const&)
| | ->00.19% (2,339,352B) 0x89B7C6C: llvm::SelectionDAG::getConstant(llvm::APInt const&, llvm::EVT, bool)
| | | ->00.16% (1,913,688B) 0x89B7D7E: llvm::SelectionDAG::getConstant(unsigned long, llvm::EVT, bool)
| | | ->00.03% (421,632B) 0x89B8D87: llvm::SelectionDAG::FoldConstantArithmetic(unsigned int, llvm::EVT, llvm::SDNode*, llvm::SDNode*)
| | ->00.09% (1,161,144B) 0x8770E51: OSL::pvt::LLVM_Util::constant(unsigned long)
| | | ->00.09% (1,161,144B) 0x87730AE: OSL::pvt::LLVM_Util::constant_ptr(void*, llvm::PointerType*)
| | |   ->00.07% (893,304B) 0x8759E5E: OSL::pvt::BackendLLVM::llvm_assign_initial_value(OSL::pvt::Symbol const&)
| | ->00.06% (679,392B) 0x9090D66: llvm::ConstantInt::get(llvm::IntegerType*, unsigned long, bool)
| ->00.25% (3,018,640B) 0x9093F54: llvm::ConstantCreator<llvm::ConstantExpr, llvm::Type, llvm::ExprMapKeyType>::create(llvm::Type*, llvm::ExprMapKeyType const&, unsigned short)

->00.51% (6,249,363B) 0x91475DA: llvm::MallocSlabAllocator::Allocate(unsigned long)
| ->00.42% (5,193,728B) 0x9147382: llvm::BumpPtrAllocator::StartNewSlab()
| | ->00.42% (5,173,248B) 0x9147492: llvm::BumpPtrAllocator::Allocate(unsigned long, unsigned long)
| | | ->00.16% (2,015,232B) 0x911AF7F: llvm::StructType::setBody(llvm::ArrayRef<llvm::Type*>, bool)
| | | ->00.11% (1,310,720B) 0x911BCF0: llvm::PointerType::get(llvm::Type*, unsigned int)
| | | ->00.07% (892,928B) 0x911B64D: llvm::StructType::create(llvm::LLVMContext&, llvm::StringRef)
| | | ->00.07% (802,816B) 0x911DAC0: llvm::FunctionType::get(llvm::Type*, llvm::ArrayRef<llvm::Type*>, bool)
| | | ->00.01% (135,168B) 0x911C772: llvm::ArrayType::get(llvm::Type*, unsigned long)
| ->00.09% (1,055,635B) 0x91474C8: llvm::BumpPtrAllocator::Allocate(unsigned long, unsigned long)
|   ->00.09% (1,055,635B) 0x911AF7F: llvm::StructType::setBody(llvm::ArrayRef<llvm::Type*>, bool)
|     ->00.09% (1,055,635B) 0x911B6FE: llvm::StructType::create(llvm::LLVMContext&, llvm::ArrayRef<llvm::Type*>, llvm::StringRef, bool)
|       ->00.09% (1,055,635B) 0x875AFC7: OSL::pvt::BackendLLVM::llvm_type_groupdata()

->00.37% (4,527,960B) 0x9094622: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_(std::_Rb_tree_node_base const*, std::_Rb_tree_node_base const*, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| ->00.37% (4,491,720B) 0x909482E: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_unique(std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| | ->00.37% (4,491,720B) 0x9094968: std::_Rb_tree<std::pair<llvm::Type*, llvm::ExprMapKeyType>, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*>, std::_Select1st<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::less<std::pair<llvm::Type*, llvm::ExprMapKeyType> >, std::allocator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> > >::_M_insert_unique_(std::_Rb_tree_const_iterator<std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> >, std::pair<std::pair<llvm::Type*, llvm::ExprMapKeyType> const, llvm::ConstantExpr*> const&)
| |   ->00.37% (4,491,720B) 0x9094C3A: llvm::ConstantUniqueMap<llvm::ExprMapKeyType, llvm::ExprMapKeyType const&, llvm::Type, llvm::ConstantExpr, false>::getOrCreate(llvm::Type*, llvm::ExprMapKeyType const&)
| |     ->00.37% (4,491,720B) 0x9091CC7: llvm::ConstantExpr::getIntToPtr(llvm::Constant*, llvm::Type*)

->00.30% (3,670,016B) 0x9096FAE: llvm::DenseMapBase<llvm::DenseMap<llvm::DenseMapAPIntKeyInfo::KeyTy, llvm::ConstantInt*, llvm::DenseMapAPIntKeyInfo>, llvm::DenseMapAPIntKeyInfo::KeyTy, llvm::ConstantInt*, llvm::DenseMapAPIntKeyInfo>::grow(unsigned int)
| ->00.30% (3,670,016B) 0x9090C16: llvm::ConstantInt::get(llvm::LLVMContext&, llvm::APInt const&)
|   ->00.30% (3,670,016B) 0x8770E51: OSL::pvt::LLVM_Util::constant(unsigned long) 
sfriedmapixar commented 8 years ago

I should also mention that this trace was from the 1.6.8 cut.

lgritz commented 8 years ago

Ooh, I can believe that we forgot to count a couple things. I'll get on that, should be easy.

LLVM's internal data structures are going to be very hard to account for, but we should be able to get all the OSL-side stuff.

Are you only concerned that we aren't counting properly, or are you also suspecting that we may be leaking or allocating more than we need?

sfriedmapixar commented 8 years ago

So far it just seems like not counting -- it doesn't look like any leaking, but I haven't dug in deep enough to see if there are any good places to trim fat.