ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.19k stars 9.34k forks source link

Possible regression on master #89

Closed BartlomiejLewandowski closed 1 year ago

BartlomiejLewandowski commented 1 year ago

Hi,

I see that interactive mode has been merged in, I was trying to test the repository on a larger set of weights, and found that there is no output anymore. When running it in interactive mode, the code works, so there might be something going on. Haven't had the time to look at it yet.

The code reports number of tokens / second at the end so it just seems that the tokens are not sent to the console.

Cheers

BartlomiejLewandowski commented 1 year ago

commit hash

1808ee0500ea674b4bc2911acd0489ee5cbcef87

jyomu commented 1 year ago

I don't know whether this is the right place to report this, but I am reporting it because I am experiencing a similar event.

I built it on Windows using gcc. It works on 7B and 13B but c0000005 occurs on 30B.

PS E:\AI\llm\llama.cpp> ./main -m ./models/30B/ggml-model-q4_0.bin -t 8 -i -r "User:" -p "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
>> 
>> User: Hello, Bob.
>> Bob: Hello. How may I help you today?
>> User: Please tell me the largest city in Europe.
>> Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
>> User:"
main: seed = 1678708966
llama_model_load: loading model from './models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 6656
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 17920
llama_model_load: n_parts = 4
llama_model_load: ggml ctx size = 20951.50 MB
PS E:\AI\llm\llama.cpp> $LASTEXITCODE
-1073741819

The following is the contents of Windows Error Reporting.

Version=1
EventType=APPCRASH
EventTime=133231813194556600
ReportType=2
Consent=1
UploadTime=133231813195576600
ReportStatus=268435456
ReportIdentifier=c9723dd0-cc3d-4c39-8bf9-a4a6766ef853
IntegratorReportIdentifier=7fe63951-de98-4d0d-be1f-d08b02eca86f
Wow64Host=34404
NsAppName=main.exe
AppSessionGuid=00002c28-0001-0028-497f-28d2a055d901
TargetAppId=W:00060318d4310657e8368557f183e15c47cd0000ffff!0000bad3aec1e66df23f486e98be59d3d1ae85a83bd9!main.exe
TargetAppVer=2023//03//13:03:27:36!388547!main.exe
BootId=4294967295
ServiceSplit=54251704
TargetAsId=871
IsFatal=1
EtwNonCollectReason=4
Response.BucketId=123ff91fc2a3e430af2ac54681b707a7
Response.BucketTable=4
Response.LegacyBucketId=2245824270812252071
Response.type=4
Sig[0].Name=アプリケーション名
Sig[0].Value=main.exe
Sig[1].Name=アプリケーションのバージョン
Sig[1].Value=0.0.0.0
Sig[2].Name=アプリケーションのタイムスタンプ
Sig[2].Value=640e9828
Sig[3].Name=障害モジュールの名前
Sig[3].Value=main.exe
Sig[4].Name=障害モジュールのバージョン
Sig[4].Value=0.0.0.0
Sig[5].Name=障害モジュールのタイムスタンプ
Sig[5].Value=640e9828
Sig[6].Name=例外コード
Sig[6].Value=c0000005
Sig[7].Name=例外オフセット
Sig[7].Value=0000000000011dc3
DynamicSig[1].Name=OS バージョン
DynamicSig[1].Value=10.0.22000.2.0.0.256.48
DynamicSig[2].Name=ロケール ID
DynamicSig[2].Value=1041
DynamicSig[22].Name=追加情報 1
DynamicSig[22].Value=30be
DynamicSig[23].Name=追加情報 2
DynamicSig[23].Value=30be38a838bafed693f82c59319f0599
DynamicSig[24].Name=追加情報 3
DynamicSig[24].Value=318b
DynamicSig[25].Name=追加情報 4
DynamicSig[25].Value=318bea61f3ce38372a9e5e439565af39
UI[2]=E:\AI\llm\llama.cpp\main.exe
LoadedModule[0]=E:\AI\llm\llama.cpp\main.exe
LoadedModule[1]=C:\WINDOWS\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\WINDOWS\System32\KERNEL32.DLL
LoadedModule[3]=C:\WINDOWS\System32\KERNELBASE.dll
LoadedModule[4]=C:\WINDOWS\System32\msvcrt.dll
State[0].Key=Transport.DoneStage1
State[0].Value=1
OsInfo[0].Key=vermaj
OsInfo[0].Value=10
OsInfo[1].Key=vermin
OsInfo[1].Value=0
OsInfo[2].Key=verbld
OsInfo[2].Value=22000
OsInfo[3].Key=ubr
OsInfo[3].Value=1641
OsInfo[4].Key=versp
OsInfo[4].Value=0
OsInfo[5].Key=arch
OsInfo[5].Value=9
OsInfo[6].Key=lcid
OsInfo[6].Value=1041
OsInfo[7].Key=geoid
OsInfo[7].Value=122
OsInfo[8].Key=sku
OsInfo[8].Value=48
OsInfo[9].Key=domain
OsInfo[9].Value=0
OsInfo[10].Key=prodsuite
OsInfo[10].Value=256
OsInfo[11].Key=ntprodtype
OsInfo[11].Value=1
OsInfo[12].Key=platid
OsInfo[12].Value=10
OsInfo[13].Key=sr
OsInfo[13].Value=0
OsInfo[14].Key=tmsi
OsInfo[14].Value=222050226
OsInfo[15].Key=osinsty
OsInfo[15].Value=3
OsInfo[16].Key=iever
OsInfo[16].Value=11.1.22000.0-11.0.1000
OsInfo[17].Key=portos
OsInfo[17].Value=0
OsInfo[18].Key=ram
OsInfo[18].Value=32721
OsInfo[19].Key=svolsz
OsInfo[19].Value=930
OsInfo[20].Key=wimbt
OsInfo[20].Value=0
OsInfo[21].Key=blddt
OsInfo[21].Value=210604
OsInfo[22].Key=bldtm
OsInfo[22].Value=1628
OsInfo[23].Key=bldbrch
OsInfo[23].Value=co_release
OsInfo[24].Key=bldchk
OsInfo[24].Value=0
OsInfo[25].Key=wpvermaj
OsInfo[25].Value=0
OsInfo[26].Key=wpvermin
OsInfo[26].Value=0
OsInfo[27].Key=wpbuildmaj
OsInfo[27].Value=0
OsInfo[28].Key=wpbuildmin
OsInfo[28].Value=0
OsInfo[29].Key=osver
OsInfo[29].Value=10.0.22000.1641.amd64fre.co_release.210604-1628
OsInfo[30].Key=buildflightid
OsInfo[30].Value=4319975B-C3DE-4D41-B26A-6F1EE3800828.1
OsInfo[31].Key=edition
OsInfo[31].Value=Professional
OsInfo[32].Key=ring
OsInfo[32].Value=Retail
OsInfo[33].Key=expid
OsInfo[33].Value=RS:D674,FX:117B97D4,FX:117B9872,FX:118B0639,FX:119E26AD,FX:11A8C293,FX:11A8C2FE,FX:11D898D7,FX:11DB147C,FX:11DE505A,FX:11E11E97,FX:11E3E2BA,FX:11E50151,FX:11E9EE98
OsInfo[34].Key=fconid
OsInfo[34].Value=18299130,0,2,1;19638787,0,2,1;25704915,1,2,1;34508795,0,2,1;35825666,0,1,1;37474955,0,1,0;37609034,0,1,0;37801696,0,1,1;37837060,0,1,1;37926348,0,1,0;38266310,0,2,1;38406949,0,1,0;38651681,0,1,1;38961938,0,2,1;39070303,0,2,1;39263329,1,2,1;39319950,0,2,0;39545181,1,2,0;39645403,0,1,1;39909691,0,2,1;40041209,0,2,1;40041494,0,2,1;40325239,0,1,1;40741162,0,2,1;40984526,0,1,1;40997837,0,1,1;41172767,0,1,1;41217122,0,2,1;41296716,0,2,1;41296903,0,2,1;41309695,0,2,1;41310287,0,2,1;41326452,0,2,1;41349904,0,2,1;41390425,0,2,1;41410354,0,2,1;41534063,0,2,1;41646676,0,2,1;41738344,0,2,1;41801248,0,2,1;41859555,0,2,1;42137249,0,2,1;42442583,0,2,1
OsInfo[35].Key=containerid
OsInfo[36].Key=containertype
OsInfo[37].Key=edu
OsInfo[37].Value=0
OsInfo[38].Key=servicinginprogress
OsInfo[38].Value=0
FriendlyEventName=動作が停止しました
ConsentKey=APPCRASH
AppName=main.exe
AppPath=E:\AI\llm\llama.cpp\main.exe
NsPartner=windows
NsGroup=windows8
ApplicationIdentity=911A6264BDE8642697EB772EE4DB073A
MetadataHash=-676135523
blackhole89 commented 1 year ago

c0000005 is the Windows version of a segmentation fault, if I recall correctly. I haven't tested with >13B myself, but it's quite surprising if something broke for that case only.

Is there any chance you could build with debug info (add -g to CFLAGS, CXXFLAGS, LDFLAGS) and run in a debugger to figure out where exactly the crash occurs?

jyomu commented 1 year ago

Is this okay? image

blackhole89 commented 1 year ago

Yes, that's quite useful, thanks.

I notice that mem_buffer and several other locals that are derived from ctx are shown as being NULL on the left. Considering the call stack, I think it looks quite likely that the allocation of the mem_buffer simply failed at ggml.c:2422, which is not too surprising if it's requesting on the order of 20GB. If I'm reading everything correctly, the code has no check pertaining to the outcome of that anywhere between llama_model_load's ggml_init on line 230 and the ggml_new_tensor_2d on line 248, which would fail in exactly the way you observe if the failed malloc set the ctx.mem_buffer to NULL.

How much RAM do you have in total?

jyomu commented 1 year ago

I have a total of 32GB of RAM. I added 4GB of virtual memory and it works fine!