brightway-lca / brightway2

Metapackage for brightway2 imports and documentation
https://brightway.dev/
BSD 3-Clause "New" or "Revised" License
100 stars 37 forks source link

Error in process parallelization while importing ecoinvent database #17

Closed aleksandra-kim closed 5 years ago

aleksandra-kim commented 5 years ago

Original report by Anonymous.


I just installed brightway2 and I am trying to import the ecoinvent 3.5 database. I'm using the lines from the "how to get started" notebook:

from brightway2 import *

create project

projects.set_current("ecoinvent-import")

load ecoinvent db

ei35default = SingleOutputEcospold2Importer( r"C:\Users\name\Desktop\LCA\resources\ecoinvent v3.5\APOS\datasets", "ecoinvent 3.5 APOS" )

The code starts and shows a message:

Extracting XML data from 16045 datasets

Shortly after the following error pops up:

File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self)

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if name == 'main': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.

The code keeps running, showing the same error over and over again. I let it run for an hour, then I aborted. There seems to be some problem with the parallelization. Any ideas on how to fix it?

I am using Python 3.6 in the Pycharm IDE on Windows.

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


There have been problems with multiprocessing in Windows before, and I thought that it defaulted to not using it on Windows, but I don't see this code anymore.

For the time being, you can do this:

#!python

from bw2io.extractors.ecospold2 import Ecospold2DataExtractor

class FixedExtractor:
    @classmethod
    def extract(cls, dirpath, db_name):
        return Ecospold2DataExtractor.extract(dirpath, db_name, use_mp=False)

ei = SingleOutputEcospold2Importer(
    "/Users/cmutel/Sync/3.5/cutoff/datasets",
    "something",
    extractor=FixedExtractor
)
aleksandra-kim commented 5 years ago

Original comment by Benjamin W. Portner (Bitbucket: pommespapst, GitHub: pommespapst).


That fixed it. Thank you very much!

aleksandra-kim commented 5 years ago

Original comment by Adrian Haas (Bitbucket: haasad, GitHub: haasad).


It did some quick checks and it appears that the problem is specific to PyCharm on Windows. From the console/ipython/jupyter windows uses multiprocessing successfully (all cores busy). PyCharm on linux also works fine.

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Should be fixed in https://bitbucket.org/cmutel/brightway2-io/commits/5ad0c27a9616eceeb9dceeb51afe1bfcbb027b1d. We can't stop PyCharm from running Python in a certain way, but we can catch the error and make it possible to turn multiprocessing off.