Closed anarkiwi closed 1 month ago
Same error, and I'll add that re-running the generation command doesn't let the generation continue from where it failed. It skips it if the folder exists.
I think this error is related to the number of workers in the generation process. It looks like OP is using 16 workers and they failed 72% through generating the impaired dataset. I was originally using 24 workers and mine failed twice at 48% through the same dataset. I tried again using 1 worker and it made it all the way through (but took forever). I would guess it probably works with 8 or less workers? Can't say why using more workers causes the error, though. Also not sure why it only fails on this specific dataset generation.
Also, yeah, it's frustrating behavior that it skips generation if the folder exists. Would be nice if it verifies data is complete if the folder already exists.
Changing the number of workers didn't change anything for me (for the record, I have 40 workers and it always fails at 57%).
From the error, and from some SO posts, I tried changing the map_size value on line 89 of the writer from int(4e12)
to int(8e12)
and that worked. I tried 8e13 but it errored out saying it couldn't allocate enough space for the map. So this should ideally be a number which is at least as big as the dataset. I'm not sure how 4e12 worked for anyone.
Issue 1 year old. Have had significant changes to codebase, including dataset generation, unclear if still a problem. Closing for now.
Per the README, I tried this on main (host is Ubuntu 22.03, 384GB RAM, 16T disk - ext4):
What sort of resources are required to generate the dataset (or is there something else going on)? The torchsig repo and "examples" directory are the only things on /local.
josh@worker01:/local$ python3 torchsig/scripts/generate_sig53.py --root=torchsig/examples --all=True
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66250/66250 [59:50<00:00, 18.45it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6625/6625 [04:08<00:00, 26.69it/s] 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 237169/331250 [2:44:20<1:05:11, 24.05it/s] Traceback (most recent call last): File "/local/torchsig/scripts/generate_sig53.py", line 73, in
main()
File "/home/josh/.local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/home/josh/.local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/josh/.local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/home/josh/.local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, **kwargs)
File "/local/torchsig/scripts/generate_sig53.py", line 62, in main
generate(root, configs[:4])
File "/local/torchsig/scripts/generate_sig53.py", line 31, in generate
creator.create()
File "/home/josh/.local/lib/python3.10/site-packages/torchsig/utils/writer.py", line 158, in create
self.writer.write(batch)
File "/home/josh/.local/lib/python3.10/site-packages/torchsig/utils/writer.py", line 118, in write
txn.put(
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached
josh@worker01:/local$
josh@worker01:/local$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 18T 4.8T 12T 29% /local
josh@worker01:/local$ free
total used free shared buff/cache available
Mem: 396137076 2059208 31594060 22688 362483808 391419648
Swap: 0 0 0
josh@worker01:/local$ uname -a
Linux worker01 6.2.0-33-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 10:33:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux