dotnet / ai-samples

MIT License
220 stars 38 forks source link

Error on Phi - Unhandled exception. System.ArgumentOutOfRangeException: Length of JSON exceeded int.MaxValue, not supported yet (Parameter 'length') #59

Closed sanme98 closed 3 weeks ago

sanme98 commented 1 month ago

Hi, I have this error while running the local-models/Phi, may I know how to resolve it?

My system is Ubuntu 22.04 and runs as CPU. I have tried to change to the latest version of TorchSharp 0.102.4, the error was the same.

Loading Phi2 from huggingface model weight folder Unhandled exception. System.ArgumentOutOfRangeException: Length of JSON exceeded int.MaxValue, not supported yet (Parameter 'length') at TorchSharp.PyBridge.Safetensors.LoadIndex(Stream stream) at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_safetensors(Module module, Stream stream, Boolean strict, IList1 skip, Dictionary2 loadedParameters, Boolean leaveOpen) at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_safetensors(Module module, String location, Boolean strict, IList1 skip, Dictionary2 loadedParameters) at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_checkpoint(Module module, String path, String checkpointName, Boolean strict, IList1 skip, Dictionary2 loadedParameters) at PhiForCasualLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType defaultDType, String device) in /home/xxxx/ai-samples/src/local-models/Phi/Phi.cs:line 45 at Program.

$(String[] args) in /home/xxxx/ai-samples/src/local-models/Phi/Program.cs:line 32

Thank you.

LittleLittleCloud commented 1 month ago

@sanme98 which phi weight are you downloading

sanme98 commented 3 weeks ago

@LittleLittleCloud, thanks for your info, after checked the phi weight, i found the model-00001-of-00002.safetensors and model-00002-of-00002.safetensors were smaller than expected. I re-clone the git again, the error above was disappeared.

Btw, can the local-models/Phi supported CPU? I have tried to use CPU, but it will has the error below, but Cuda has no issues. I did some google, it seems Phi2 is not supported ScalarType.Float16 > https://huggingface.co/microsoft/phi-2/discussions/14

Unhandled exception. System.Runtime.InteropServices.ExternalException (0x80004005): "LayerNormKernelImpl" not implemented for 'Half' Exception raised from operator() at /pytorch/aten/src/ATen/native/cpu/layer_norm_kernel.cpp:187 (most recent call first):

LittleLittleCloud commented 3 weeks ago

@sanme98 yes, you can use CPU to inference. The inference speed would be dramastically slower though (about 15s per token on my dev box)

I have tried to use CPU, but it will has the error below, but Cuda has no issues. I did some google, it seems Phi2 is not supported ScalarType.Float16

Some torch operators doesn't support half tensor on CPU so you need to set the defaultType to Float32 in the Program.cs if you want to inference on CPU

sanme98 commented 3 weeks ago

@LittleLittleCloud I have the error below if I change defaultType to Float32. Please note i have commented out the // Comment out the following two line if your machine support Cuda 12 since I read from the doc, torch can be bundled inside to the nuget package.

dotnet run
Loading Phi2 from huggingface model weight folder
Unhandled exception. System.ArgumentException: Mismatched data sizes in SetBytes().
   at TorchSharp.torch.Tensor.set_bytes(Span`1 value)
   at TorchSharp.torch.nn.Module.load_state_dict(Dictionary`2 source, Boolean strict, IList`1 skip)
   at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_safetensors(Module module, Stream stream, Boolean strict, IList`1 skip, Dictionary`2 loadedParameters, Boolean leaveOpen)
   at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_safetensors(Module module, String location, Boolean strict, IList`1 skip, Dictionary`2 loadedParameters)
   at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_checkpoint(Module module, String path, String checkpointName, Boolean strict, IList`1 skip, Dictionary`2 loadedParameters)
   at PhiForCasualLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType defaultDType, String device) in /home/xxxx/ai-samples/src/local-models/Phi/Phi.cs:line 45
   at Program.<Main>$(String[] args) in /home/xxxx/ai-samples/src/local-models/Phi/Program.cs:line 32
sanme98 commented 3 weeks ago

Hi, the error above can resolve by upgrade the TorchSharp versions and the Phi2 can inference already.

    <PackageReference Include="TorchSharp" Version="0.102.5" />
    <PackageReference Include="TorchSharp-cpu" Version="0.102.5" />
    <PackageReference Include="TorchSharp.PyBridge" Version="1.4.1" />

Thank you @LittleLittleCloud for your help.