Nayaco / mahoshojos-large-model-support

mahoshojo's large model support
Apache License 2.0
1 stars 0 forks source link

Compilation failed. #2

Open wyq-carol opened 1 year ago

wyq-carol commented 1 year ago

I followed instructions in README.txt and compiled pytorch v1.12.0 successfully but failed after applying the patch. I got error message as followed.

Traceback (most recent call last):
  File "/root/anaconda3/envs/env_gnn/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/env_gnn/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pytorch/torchgen/gen.py", line 2535, in <module>
    main()
  File "/home/pytorch/torchgen/gen.py", line 2393, in main
    parsed_yaml = parse_native_yaml(native_yaml_path, tags_yaml_path, ignore_keys)
  File "/home/pytorch/torchgen/gen.py", line 237, in parse_native_yaml
    _GLOBAL_PARSE_NATIVE_YAML_CACHE[path] = parse_native_yaml_struct(
  File "/home/pytorch/torchgen/gen.py", line 174, in parse_native_yaml_struct
    func, m = NativeFunction.from_yaml(e, loc, valid_tags, ignore_keys)
  File "/home/pytorch/torchgen/model.py", line 548, in from_yaml
    dispatch_key = DispatchKey.parse(k.strip())
  File "/home/pytorch/torchgen/model.py", line 142, in parse
    raise AssertionError(f"unknown dispatch key {value}")
AssertionError: unknown dispatch key Checkpoint
  in /home/pytorch/cmake/../aten/src/ATen/native/native_functions.yaml:476:
    add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
ninja: error: rebuilding 'build.ninja': subcommand failed

I tried hard to fix the bugs but the project is so complex. I finally got error message as followed.

FAILED: bin/torch_shm_manager 
: && /usr/bin/c++ -I/usr/local/cuda-10.2/include -I/usr/local/cuda-10.2/include -I/usr/local/cuda/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -g -fno-omit-frame-pointer -O0 -rdynamic    -rdynamic caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager  -Wl,-rpath,/home/pytorch/build/lib:/root/anaconda3/envs/env_gnn/lib:/usr/lib/x86_64-linux-gnu/lib64:  lib/libshm.so  -lrt  lib/libtorch.so  -Wl,--no-as-needed,"/home/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobufd.a  -pthread  /root/anaconda3/envs/env_gnn/lib/libmkl_intel_lp64.so  /root/anaconda3/envs/env_gnn/lib/libmkl_gnu_thread.so  /root/anaconda3/envs/env_gnn/lib/libmkl_core.so  -fopenmp  -lpthread  -lm  -ldl  -Wl,--no-as-needed,"/home/pytorch/build/lib/libtorch_cuda.so" -Wl,--as-needed  lib/libc10_cuda.so  -lcudart  -lnvToolsExt  -lcufft  /usr/local/cuda-10.2/lib64/libcurand.so  -lcublas  /usr/lib/x86_64-linux-gnu/lib64/libcudnn.so  lib/libc10.so && :
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::pagein_manual()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::release_resources()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::need_prefetch()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `vtable for c10::StorageImpl'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::~StorageImpl()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::pageout_manual()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `c10::StorageImpl::~StorageImpl()'
/home/pytorch/build/lib/libtorch_cpu.so: undefined reference to `typeinfo for c10::StorageImpl'
collect2: error: ld returned 1 exit status
[1983/1985] Linking CXX shared library lib/libtorch_python.so
ninja: build stopped: subcommand failed.

I really appreciated if there is a code version that compiles properly.

JaneEyreliu commented 6 months ago

Hi, I also encountered the same problem, have you solved this problem