Open rasbt opened 1 week ago
Hi! thanks for your contribution!, great first issue!
Thank you, @rasbt, for bringing up this issue.
Interestingly, I was able to reproduce a similar problem as reported by ByteBrigand (in GitHub Codespace), but couldn't reproduce the issue with deleted chunks. I'll test on other devices to see if I can replicate it.
Thanks for sharing. May I ask which version you are using? Is this with the latest main branch or the latest stable release?
Sure @rasbt , I tested with both the latest main branch and v0.2.26, leading to the same error.
Thanks, and that's so weird.
@rasbt consider moving optimize
function in the main block.
if __name__ == "__main__":
optimize(
fn=tokenize,
inputs=train_files,
output_dir="temp",
num_workers=1,
chunk_bytes="50MB",
)
And, for me, it worked perfectly fine.
But, we are aware of the issue:
RuntimeError: All the chunks should have been deleted. Found ['chunk-x-x.bin'] error
AFAIK, this is a non-deterministic bug. It has occurred in multiple CI tests. Even re-running the code will do the trick, but we will permanently fix this weird bug very soon.
Probably, there is some issue in the way the uploader
queue passes chunk files to the remover queue
.
And, if you're on a Mac, consider upgrading litdata version to the current latest.
Thanks for the response @deependujha . After updating the code as you suggested, the example now works on my MacBook. On the Linux machine, even on the latest LitData version (I even tried to use the latest version from the main branch), I am still having the same bug.
AFAIK, this is a non-deterministic bug. It has occurred in multiple CI tests. Even re-running the code will do the trick, but we will permanently fix this weird bug very soon.
Actually, on that Linux machine, it happens every single time. So weird.
Hi, any resolution to this on linux systems?
Unfortunately, no. It seems that the issue still persists on some Linux machines. Maybe the best solution for now is to use an older litdata version (assuming this error doesn't exist in older versions) on those machines.
Unfortunately, no. It seems that the issue still persists on some Linux machines. Maybe the best solution for now is to use an older litdata version (assuming this error doesn't exist in older versions) on those machines.
sorry for the delay. I am involved in other stuffs. I'll start working on this immediately.
No worries @deependujha , I can totally understand that there are other priorities and commitments at the moment. So please don't get yourself in trouble working on it. But in case you have some time and are able to, that'd be super appreciated. Also let us know if you have any ideas we could test out (since our machines seem to reproduce the error deterministically, we could help testing potential solutions).
🐛 Bug
When using LitData on non-Studio machines, I am getting a
RuntimeError: All the chunks should have been deleted. Found ['chunk-0-0.bin']
error.To Reproduce
This error occurs when running the following example from the LitGPT README:
I made a simpler example to reproduce the issue with a standalone code snippet.
1) Download sample data
2) Run the following code
This results in the following on a Studio machine
However, on a non-Studio machine, I am getting:
Environment