godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
86.37k stars 19.24k forks source link

Crash when importing BPTC texture in optimized MSVC Windows build #92973

Open mihe opened 3 weeks ago

mihe commented 3 weeks ago

Tested versions

Reproducible in: 4.3.beta [32eba1ee1]

System information

Windows 11 (10.0.22631), AMD Ryzen 9 7940HS

Issue description

When importing pretty much any texture to the BPTC format using a build of Godot compiled with MSVC (toolchain v14.40-17.10) on Windows you end up crashing in one of the CVTT worker tasks that's spawned by image_compress_cvtt found here, with the following call stack:

cvtt::ParallelMath::RoundAndConvertToU15(const cvtt::ParallelMath::Float &) (thirdparty/cvtt/ConvectionKernels_ParallelMath.h:943)
cvtt::Internal::IndexSelector<1>::SelectIndexLDR(const cvtt::ParallelMath::Float *) (thirdparty/cvtt/ConvectionKernels_IndexSelector.h:130)
cvtt::Internal::BC7Computer::TryDualPlane(unsigned int flags, const cvtt::ParallelMath::VInt16<2>[4] * pixels, const cvtt::ParallelMath::Float[4] * floatPixels, const float * channelWeights, const cvtt::BC7EncodingPlan & encodingPlan, int numRefineRounds, cvtt::Internal::BC67::WorkInfo & work, const cvtt::ParallelMath::RoundTowardNearestForScope * rtn) (thirdparty/cvtt/ConvectionKernels_BC67.cpp:2025)
cvtt::Internal::BC7Computer::Pack(unsigned int flags, const cvtt::PixelBlockU8 * inputs, unsigned char * packedBlocks, const float * channelWeights, const cvtt::BC7EncodingPlan & encodingPlan, int numRefineRounds) (thirdparty/cvtt/ConvectionKernels_BC67.cpp:2199)
cvtt::Kernels::EncodeBC7(unsigned char * pBC, const cvtt::PixelBlockU8 * pBlocks, const cvtt::Options & options, const cvtt::BC7EncodingPlan & encodingPlan) (thirdparty/cvtt/ConvectionKernels_API.cpp:54)
_digest_row_task(const CVTTCompressionJobParams &) (modules/cvtt/image_compress_cvtt.cpp:119)
_digest_job_queue(void * p_job_queue, unsigned int p_index) (modules/cvtt/image_compress_cvtt.cpp:140)
WorkerThreadPool::_process_task(WorkerThreadPool::Task * p_task) (core/object/worker_thread_pool.cpp:93)
WorkerThreadPool::_thread_function(void * p_user) (core/object/worker_thread_pool.cpp:196)
Thread::callback(unsigned __int64 p_caller_id, const Thread::Settings & p_settings, void(*)(void *) p_callback, void * p_userdata) (core/os/thread.cpp:66)

With the main thread being here:

Concurrency::details::stl_condition_variable_win7::wait_for(_Stl_critical_section *)
Concurrency::details::stl_condition_variable_win7::wait(_Stl_critical_section *)
_Cnd_wait(_Cnd_internal_imp_t * cond, _Mtx_internal_imp_t * mtx)
std::condition_variable::wait(std::unique_lock<std::mutex> &)
Semaphore::wait() (core\os\semaphore.h:79)
WorkerThreadPool::wait_for_group_task_completion(__int64 p_group) (core\object\worker_thread_pool.cpp:606)
image_compress_cvtt(Image * p_image, Image::UsedChannels p_channels) (modules\cvtt\image_compress_cvtt.cpp:252)
Image::compress_from_channels(Image::CompressMode p_mode, Image::UsedChannels p_channels, Image::ASTCFormat p_astc_format) (core\io\image.cpp:2678)
ResourceImporterTexture::save_to_ctex_format(Ref<FileAccess> f, const Ref<Image> & p_image, ResourceImporterTexture::CompressMode p_compress_mode, Image::UsedChannels p_channels, Image::CompressMode p_compress_format, float p_lossy_quality) (editor\import\resource_importer_texture.cpp:305)
ResourceImporterTexture::_save_ctex(const Ref<Image> & p_image, const String & p_to_path, ResourceImporterTexture::CompressMode p_compress_mode, float p_lossy_quality, Image::CompressMode p_vram_compression, bool p_mipmaps, bool p_streamable, bool p_detect_3d, bool p_detect_roughness, bool p_detect_normal, bool p_force_normal, bool p_srgb_friendly, bool p_force_po2_for_compressed, unsigned int p_limit_mipmap, const Ref<Image> & p_normal, Image::RoughnessChannel p_roughness_channel) (editor\import\resource_importer_texture.cpp:415)
ResourceImporterTexture::import(const String & p_source_file, const String & p_save_path, const HashMap<StringName,Variant,HashMapHasherDefault,HashMapComparatorDefault<StringName>,DefaultTypedAllocator<HashMapElement<StringName,Variant>>> & p_options, List<String,DefaultAllocator> * r_platform_variants, List<String,DefaultAllocator> * r_gen_files, Variant * r_metadata) (editor\import\resource_importer_texture.cpp:666)
EditorFileSystem::_reimport_file(const String & p_file, const HashMap<StringName,Variant,HashMapHasherDefault,HashMapComparatorDefault<StringName>,DefaultTypedAllocator<HashMapElement<StringName,Variant>>> & p_custom_options, const String & p_custom_importer, Variant * p_generator_parameters, bool p_update_file_system) (editor\editor_file_system.cpp:2263)
EditorFileSystem::reimport_files(const Vector<String> & p_files) (editor\editor_file_system.cpp:2508)
EditorFileSystem::_update_scan_actions() (editor\editor_file_system.cpp:703)
EditorFileSystem::_notification(int p_what) (editor\editor_file_system.cpp:1363)
EditorFileSystem::_notificationv(int p_notification, bool p_reversed) (editor\editor_file_system.h:138)
Object::notification(int p_notification, bool p_reversed) (core\object\object.cpp:906)
SceneTree::_process_group(SceneTree::ProcessGroup * p_group, bool p_physics) (scene\main\scene_tree.cpp:938)
SceneTree::_process(bool p_physics) (scene\main\scene_tree.cpp:1034)
SceneTree::process(double p_time) (scene\main\scene_tree.cpp:528)
Main::iteration() (main\main.cpp:4099)
OS_Windows::run() (platform\windows\os_windows.cpp:1686)
widechar_main(int argc, wchar_t * * argv) (platform\windows\godot_windows.cpp:181)
_main() (platform\windows\godot_windows.cpp:208)
main(int argc, char * * argv) (platform\windows\godot_windows.cpp:220)

Note that this does not seem to happen when using the prebuilt release of Godot 4.3-beta1, and I've had this crash happen before the commit associated with that release, so I fear that this is likely an MSVC-specific issue, perhaps even a compiler bug.

Also note that this only seems to happen in optimized builds of Godot. Non-optimized builds (dev_build=yes) seems to import the texture just fine.

The important details in terms of the MRP seems to be as follows:

  1. textures/vram_compression/import_s3tc_bptc must be true.
  2. The compression mode of the texture must be "VRAM Compressed".
  3. The "High Quality" compression option must be enabled.

Steps to reproduce

  1. Build Godot (32eba1ee1) using MSVC (v14.40-17.10) and flags target=editor dev_build=no debug_symbols=yes.
  2. Make sure there's no .godot folder present in the MRP project.
  3. Open the MRP project using the above mentioned build.
  4. Observe the crash.

Minimal reproduction project (MRP)

bptc-crash.zip

mihe commented 3 weeks ago

optimize=size (aka /O1) does not reproduce the issue, so it seems limited to optimize=speed (aka /O2)

I've also tried running the worker tasks serially on the main thread instead, to rule out threading issues, and it still reproduces, so it's not likely to be a threading issue.

I also tried running with Application Verifier (using the Basics/Exceptions and Basics/Heaps tests) and that only gave me a more specific call stack (i.e. the inlined portion) so that likely rules out this being a memory corruption caused elsewhere.

akien-mga commented 3 weeks ago

CC @BlueCube3310 @RandomShaper @bruvzg

mihe commented 3 weeks ago

Going as far back as 0f0106c101edd63351a8e3f8be4b1f87e4501d78 (in hopes of bisecting it to a Godot commit) did not change anything. It still results in a crash.

What did resolve the issue however was downgrading the MSVC toolchain from v14.40-17.10 (released May 21st 2024) to the previous version, v14.39-17.9 (released February 13th 2024), so it does seem quite likely that this is the result of some code generation bug introduced in MSVC v14.40-17.10.

I suppose the appropriate thing to do would be to report this to Microsoft, but reducing this down to something reportable strikes me as a difficult task.

BlueCube3310 commented 3 weeks ago

It looks like this happens when SSE2 instructions are enabled. Are they enabled by default in the official builds?

https://github.com/godotengine/godot/blob/5241d30bfa223ed45784e32d8143d20a98a8d862/thirdparty/cvtt/ConvectionKernels_ParallelMath.h#L935-L945

mihe commented 3 weeks ago

Yes, SSE2 is explicitly enabled for x86 and already included (and thus implicitly enabled) in x86-64.

EDIT: Well, technically the /arch option is left at its default value for MSVC, which is /arch:SSE2, so perhaps not explicitly enabled with MSVC, but SSE2 is a baseline requirement for Godot.

BlueCube3310 commented 3 weeks ago

Could you modify the function to look like this and check the results?

static UInt15 RoundAndConvertToU15(const Float &v, const void* /*roundingMode*/) 
{ 
     __m128i lo = _mm_cvtps_epi32(v.m_values[0]); 
     __m128i hi = _mm_cvtps_epi32(v.m_values[1]); 

     UInt15 result; 
     result.m_value = _mm_packs_epi32(lo, hi); 
     return result; 
}

The crash might be happening due to the compiler getting confused about unused SIMD variables.

mihe commented 3 weeks ago

That did not change anything, unfortunately.

BlueCube3310 commented 2 weeks ago

Does this also happen with HDR textures (BC6), or only with LDR (BC7)?

mihe commented 2 weeks ago

How would I go about using BC6?

BlueCube3310 commented 2 weeks ago

How would I go about using BC6?

You can import a .hdr file (from polyhaven, for example) and enable VRAM compression.

mihe commented 1 week ago

No, a VRAM compressed .hdr file (this one to be specific) with compress/high_quality enabled does not result in a crash with MSVC v14.40-17.10, so I'm assuming it's specific to whatever code path (presumably as a result of inlining) that BC7 uses.