elix-tech / kmol

kMoL is a machine learning library for drug discovery and life sciences, with federated learning capabilities.
MIT License
36 stars 6 forks source link

An unhandled exception occured in plb model #13

Open wdm2 opened 9 months ago

wdm2 commented 9 months ago

When I run the following command, an error occurs and the calculation stops.

Command: kmol train data/configs/model/plb/fingerprint_ligand+bow_protein.json

Error: 0:06:13 | 2023-12-06 20:57:42 | [ ERROR | logger.py:79] > An unhandled exception occured: unable to mmap 8192 bytes from file </torch_1029999_3155664760_61209>: Cannot allocate memory (12). Traceback: File "/home/userA/miniconda3/envs/kmol/bin/kmol", line 8, in <module> sys.exit(main()) File "/home/userA/kmol/src/kmol/run.py", line 624, in main Executor(config=Config.from_file(args.config, args.job), config_path=args.config).run(args.job) File "/home/userA/kmol/src/mila/factories.py", line 135, in run getattr(self, job)() File "/home/userA/kmol/src/kmol/run.py", line 90, in train streamer = GeneralStreamer(config=self._config) File "/home/userA/kmol/src/kmol/data/streamers.py", line 41, in __init__ self._dataset = self._preprocessor._load_dataset() File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 232, in _load_dataset dataset = self._cache_manager.execute_cached_operation( File "/home/userA/kmol/src/kmol/core/helpers.py", line 210, in execute_cached_operation content = processor(**arguments) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 249, in _prepare_dataset dataset = self._run_parrallel(self._prepare_chunk, loader, self._use_disk) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 113, in _run_parrallel dataset = list(itertools.chain.from_iterable([future.result() for future in futures])) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 113, in <listcomp> dataset = list(itertools.chain.from_iterable([future.result() for future in futures])) File "/home/userA/miniconda3/envs/kmol/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/home/userA/miniconda3/envs/kmol/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception

I have tried reducing the batch size and increasing the allocated memory, but the same error occurs. Is there any solution to this?

wdm2 commented 9 months ago

I apologize for the multiple messages. I reduced the size of the input file using the following command: $ head -n 100000 chembl.csv > chemble-100k.csv

I then updated the input file to 'data/configs/model/plb/fingerprint_ligand+bow_protein.json' and ran the following command: $ kmol train data/configs/model/plb/fingerprint_ligand+bow_protein.json

As a result, the error has changed to the following:

0:00:01 | 2023-12-07 13:25:50 | [ INFO | helpers.py:205] > Dataset Cache Key: 1e08131ac02a9f1ab2aae7d91f9d679d 0:00:02 | 2023-12-07 13:25:50 | [ INFO | preprocessor.py:248] > Starting featurization... All jobs progress: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5/5 100% 0:00:00 0:01:05 0:01:07 | 2023-12-07 13:26:56 | [ ERROR | logger.py:79] > An unhandled exception occured: A process in the process pool was terminated abruptly while the future was running or pending.. Traceback: File "/home/userA/miniconda3/envs/kmol/bin/kmol", line 8, in <module> sys.exit(main()) File "/home/userA/kmol/src/kmol/run.py", line 624, in main Executor(config=Config.from_file(args.config, args.job), config_path=args.config).run(args.job) File "/userA/kmol/src/mila/factories.py", line 135, in run getattr(self, job)() File "/home/userA/kmol/src/kmol/run.py", line 90, in train streamer = GeneralStreamer(config=self._config) File "/home/userA/kmol/src/kmol/data/streamers.py", line 41, in __init__ self._dataset = self._preprocessor._load_dataset() File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 232, in _load_dataset dataset = self._cache_manager.execute_cached_operation( File "/home/userA/kmol/src/kmol/core/helpers.py", line 210, in execute_cached_operation content = processor(**arguments) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 249, in _prepare_dataset dataset = self._run_parrallel(self._prepare_chunk, loader, self._use_disk) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 113, in _run_parrallel dataset = list(itertools.chain.from_iterable([future.result() for future in futures])) File "/home/userA/kmol/src/kmol/data/preprocessor.py", line 113, in <listcomp> dataset = list(itertools.chain.from_iterable([future.result() for future in futures])) File "/home/userA/miniconda3/envs/kmol/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/home/userA/miniconda3/envs/kmol/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception

Machine Environment: