Closed djFatNerd closed 6 months ago
I'm also having the same problem. Did you find a fix?
Could you interrupt the program and then paste the resulting stack trace here?
Sure, this is the stack trace after I pause the program:
Thank you!
Turning off multiprocessing makes the process continue, but the "results" variable which contains objects assignments is always empty.
I'm also having the same problem. Did you find a fix?
No I didn't, I feel this might be an error with the multi-processing.
This is the "results":
and the following is the error trace:
The .json file for the generated input string can't be loaded.
I also met the same issue. Not sure how to fix this.
The error message above shows that this error is from the GPT-4 side. GPT-4 did not follow the instructions and returned a JSON with the wrong format. Rerun the code could resolve the issue, and btw, which version of GPT-4 are you using?
Hi, I am trying the inference code and the process stucks here forever, may I ask u what might be the possible cause for this ?
For this problem, depending on the room size, this step can take several minutes.
![]()
Turning off multiprocessing makes the process continue, but the "results" variable which contains objects assignments is always empty.
I met this issue, and a simple rerun solved my issue.
The error message above shows that this error is from the GPT-4 side. GPT-4 did not follow the instructions and returned a JSON with the wrong format. Rerun the code could resolve the issue, and btw, which version of GPT-4 are you using?
Hi, I am using both 'gpt-4-1106-preview' and 'gpt-3.5-turbo' as I noticed they are the defaults in holodeck.py.
I initially only used gpt3.5 since it's more cost friendly. I noticed gpt-3.5 almost never gives an appropriate response in the correct format while gpt-4 has a much higher success rate.
However, the stuck problem still exists when _multiprocessing = True__ in object_selector.py, the process just stuck forever, but if I set it to false the code proceed.
Currently, after I proceed with gpt-4 and multi_processing = False, I am having the following issue:
I am using Ubuntu 18.04.
Please use gpt-4-1106-preview
for all the modules since gpt-3.5-turbo
cannot follow the prompt well and will consistently make mistakes.
For running holodeck on a headless server, please refer to this: https://github.com/allenai/Holodeck/issues/6
Same issue encountered, no code changes made.
Same issue encountered, no code changes made.
Which issue? Could you rerun the code?
Same issue encountered, no code changes made.
Which issue? Could you rerun the code?
Still unresolved, @djFatNerd , have you resolved this issue? My situation is the same as yours.
@YueYANG1996 I am encountering the same issue here. Following the responses in the above issue and using logging, it seems that the blockage occurs in the encode_text method of CLIP, called by the retrieve function in ObjaverseRetriever. Strangely, in previous runs, this function could be successfully invoked and return the expected results. Here are the inputs for the successful calls and the last one that resulted in the blockage. I haven't made any code modifications, and I am currently puzzled about how to resolve this issue。
The following image provides the context for the encountered issue
Same issue encountered, no code changes made.
Which issue? Could you rerun the code?
Still unresolved, @djFatNerd , have you resolved this issue? My situation is the same as yours.
Hi, which issue are you referring to?
Same issue encountered, no code changes made.
Which issue? Could you rerun the code?
Still unresolved, @djFatNerd , have you resolved this issue? My situation is the same as yours.
Hi, which issue are you referring to?
the stuck problem exists when multiprocessing = True_ in object_selector.py, the process stuck forever. when multi_processing = False, I am having the same dependency issue as yours.
Yes, I can't solve the stuck problem with multi-processing=True either.
For the dependency issue, I think it's a problem with libc6 on Ubuntu 18.04, I haven't find any solution either.
When I set multiprocessing to false on a Ubuntu machine, I get a Tensor expected on one processor however tensors found on multiple processors GPU:0 and CPU error.
When I set multiprocessing to false on a Ubuntu machine, I get a Tensor expected on one processor however tensors found on multiple processors GPU:0 and CPU error.
Yes, this happened to me. You need to manually move the tensors to the same device during similarities calculations.
When I set multiprocessing to false on a Ubuntu machine, I get a Tensor expected on one processor however tensors found on multiple processors GPU:0 and CPU error.
Yes, this happened to me. You need to manually move the tensors to the same device during similarities calculations.
Can you share how you did that because i tried to move it manually but did not work for me.
Sure.
In objaverse_retriever.py, just move the variables clip_similarities and sbert_similarities, to the device of the variable query_feature_sbert.
The following code works for me:
clip_similarities = clip_similarities.to(query_feature_sbert.device) sbert_similarities = query_feature_sbert @ self.sbert_features.T.to(query_feature_sbert.device)
Same issue encountered, no code changes made.
Which issue? Could you rerun the code?
Still unresolved, @djFatNerd , have you resolved this issue? My situation is the same as yours.
Hi, which issue are you referring to?
the stuck problem exists when multiprocessing = True_ in object_selector.py, the process stuck forever. when multi_processing = False, I am having the same dependency issue as yours.
During the past few days I researched and saw lots of discussions around libc6 on Ubuntu 18.04 but I tried and none of them solved the problem.
I had to solve this by switching to Ubuntu 20.04.
Hi @djFatNerd , does everything work for you after switching to Ubuntu 20.04? I'm getting RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
when uses Ubuntu 20.04, sets multiprocessing to false and fix the tensor device. Thanks
I am using Ubuntu 22.04 and getting the same CUDA error now after moving the tensors. I have set multiprocessing to false.
On Mon, Jan 8, 2024 at 2:27 PM Dan5Playground @.***> wrote:
Hi @djFatNerd https://github.com/djFatNerd , does everything work for you after switching to Ubuntu 20.04? I'm getting RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method when uses Ubuntu 20.04, sets multiprocessing to false and fix the tensor device. Thanks
— Reply to this email directly, view it on GitHub https://github.com/allenai/Holodeck/issues/3#issuecomment-1881694959, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALYQTGIWRDMJ43FRK6YUBF3YNRCC3AVCNFSM6AAAAABA6PI3EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBRGY4TIOJVHE . You are receiving this because you commented.Message ID: @.***>
Hi @djFatNerd , does everything work for you after switching to Ubuntu 20.04? I'm getting
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
when uses Ubuntu 20.04, sets multiprocessing to false and fix the tensor device. Thanks
Yes, everything worKs for me now.
This is another issue, you can solve this by adding this line of code at the beginning of main.py.
torch.multiprocessing.set_start_method('spawn')
Hi, I am trying the inference code and the process stucks here forever, may I ask u what might be the possible cause for this ?