brightway-lca / brightway2-io

Importing and exporting for the Brightway LCA framework
BSD 3-Clause "New" or "Revised" License
26 stars 41 forks source link

Importing ecospold 1 (like the full ecoinvent 2.2) produces datasets with integer ids. #30

Closed aleksandra-kim closed 8 years ago

aleksandra-kim commented 8 years ago

Original report by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).


I think the expected behavior is to have hash ids to identify the activities. Here is how to reproduce with python 3.5:

appdirs==1.4.0
asteval==0.9.7
brightway2==2.0.2
bw2analyzer==0.9.1
bw2calc==1.4
bw2data==2.3.2
bw2io==0.5.3
bw2parameters==0.5.2
bw2speedups==2.1
click==6.6
decorator==4.0.10
docopt==0.6.2
eight==0.3.3
fasteners==0.14.1
Flask==0.11.1
future==0.15.2
ipython==5.1.0
ipython-genutils==0.1.0
itsdangerous==0.24
Jinja2==2.8
lxml==3.6.4
MarkupSafe==0.23
monotonic==1.2
nose==1.3.7
numpy==1.11.1
peewee==2.8.3
pexpect==4.2.1
pickleshare==0.7.4
powerline-status==2.5
prompt-toolkit==1.0.7
psutil==4.3.0
ptyprocess==0.5.1
Pygments==2.1.3
PyPrind==2.9.8
requests==2.11.1
scipy==0.18.0
simplegeneric==0.8.1
six==1.10.0
stats-arrays==0.4.1
traitlets==4.2.2
unicodecsv==0.14.1
Unidecode==0.4.19
voluptuous==0.9.3
wcwidth==0.1.7
Werkzeug==0.11.10
Whoosh==2.7.4
wrapt==1.10.8
xlrd==1.0.0
XlsxWriter==0.9.3
import brightway2 as bw2
Applying automatic update: 2.3 processed data format

bw2.projects.set_current("ei22")
bw2.bw2setup()

Creating default biosphere

Applying strategy: normalize_units
Applying strategy: drop_unspecified_subcategories
Applied 2 strategies in 0.01 seconds
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:00:01
Title: Writing activities to SQLite3 database:
  Started: 08/30/2016 08:14:39
  Finished: 08/30/2016 08:14:41
  Total time elapsed: 00:00:01
  CPU %: 64.90
  Memory %: 1.57
Created database: biosphere3
Creating default LCIA methods

Applying strategy: normalize_units
Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Applied 4 strategies in 1.92 seconds
Wrote 665 LCIA methods with 169551 characterization factors
Creating core data migrations

ei22importer = bw2.SingleOutputEcospold1Importer('./2.2/', 'ecoinvent22')
Extracting XML data from 4087 datasets
Extracted 4087 datasets in 5.24 seconds

ei22importer.statistics()
4087 datasets
135892 exchanges
135892 unlinked exchanges
  Type biosphere: 1613 unique unlinked exchanges
  Type production: 4087 unique unlinked exchanges
  Type technosphere: 3015 unique unlinked exchanges

 ⓔ  ei22ecospold1  In [9]   ei22importer.strategies
                    Out[9]   
[<function bw2io.strategies.generic.normalize_units>,
 <function bw2io.strategies.generic.assign_only_product_as_production>,
 <function bw2io.strategies.ecospold1_allocation.clean_integer_codes>,
 <function bw2io.strategies.biosphere.drop_unspecified_subcategories>,
 <function bw2io.strategies.biosphere.normalize_biosphere_categories>,
 <function bw2io.strategies.biosphere.normalize_biosphere_names>,
 <function bw2io.strategies.biosphere.strip_biosphere_exc_locations>,
 <function bw2io.strategies.generic.set_code_by_activity_hash>,
 functools.partial(<function link_iterable_by_fields at 0x7f5eac298b70>, other=Brightway2 SQLiteBackend: biosphere3, kind='biosphere'),
 <function bw2io.strategies.generic.link_technosphere_by_activity_hash>]

 ⓔ  ei22ecospold1  In [10]   ei22importer.apply_strategies()
Applying strategy: normalize_units
Applying strategy: assign_only_product_as_production
Applying strategy: clean_integer_codes
Applying strategy: drop_unspecified_subcategories
Applying strategy: normalize_biosphere_categories
Applying strategy: normalize_biosphere_names
Applying strategy: strip_biosphere_exc_locations
Applying strategy: set_code_by_activity_hash
Applying strategy: link_iterable_by_fields
Applying strategy: link_technosphere_by_activity_hash
Applied 10 strategies in 4.46 seconds

 ⓔ  ei22ecospold1  In [11]   ei22importer.statistics()
4087 datasets
135892 exchanges
0 unlinked exchanges

                    Out[11]   (4087, 135892, 0)

 ⓔ  ei22ecospold1  In [12]   ei22importer.write_database()
Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:01:01
Title: Writing activities to SQLite3 database:
  Started: 08/30/2016 08:19:46
  Finished: 08/30/2016 08:20:47
  Total time elapsed: 00:01:01
  CPU %: 60.60
  Memory %: 4.40
Created database: ecoinvent22
                    Out[12]   Brightway2 SQLiteBackend: ecoinvent22

 ⓔ  ei22ecospold1  In [13]   rds = bw2.Database('ecoinvent22').random()

 ⓔ  ei22ecospold1  In [14]   rds
                    Out[14]   'laser machining, metal, with YAG-laser, 500W power' (hour, RER, ['metals', 'chipless shaping'])

 ⓔ  ei22ecospold1  In [15]   rds.as_dict()
                    Out[15]   
{'authors': [{'address': 'Kanzleistrasse 4, 8610 Uster',
   'company': 'ESU',
   'country': 'CH',
   'email': 'esu-services@ecoinvent.org',
   'name': 'Roland Steiner'}],
 'categories': ['metals', 'chipless shaping'],
 'code': '10137',
 'comment': "The reference for laser machining is its operation at 100% power for 1 hour. It does not include the input of the material processed. This need to be added separately. The dataset can be used when metals are treated with a YAG laser of the capacity indicated. Factory infrastructure needs to be added. Data are based on manufacturers' data (weight and power consumption) and literature (air emissions).\nThis dataset includes work piece feeder, laser system, cooling and control system. Any additional equipment such as possibly necessary ventilation or additional security installations are not included. It includes the input of energy, of cooling water (where needed) and of the laser equipment. Further factory infrastructure (halls, buildings) are not included. The dataset includes process specific air emissions.\nLocation:  Geographical coverage encompasses the industrialised countries.\nTechnology:  HL series of YAG Lasers and Lasma 584R processing machine\nProduction volume:  unknown\nSampling:  unknown\nExtrapolations:  none\nUncertainty:  none",
 'database': 'ecoinvent22',
 'filename': '/opt/db/2.2/Process_infra_roh/10137.XML',
 'location': 'RER',
 'name': 'laser machining, metal, with YAG-laser, 500W power',
 'production amount': 1.0,
 'type': 'process',
 'unit': 'hour'}
aleksandra-kim commented 8 years ago

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).


A quick fix for the user would be:

   bw2setup()                                                                     
   ei22 = SingleOutputEcospold1Importer(location_ei22, "ecoinvent22")                                                                                                
   ei22.apply_strategies(ei22.strategies[:2])                                     

   for ds in ei22.data:                                                           
       del ds['code']                                                         

    ei22.apply_strategy(ei22.strategies[7])                      
    ei22.apply_strategies()                                                        
    ei22.statistics()                                                              
    ei22.write_database() 
aleksandra-kim commented 8 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Thanks Tomas-

You raise some interesting questions, and clearly both integer and hash codes should be possible. I will come back to this after the summer school next week.

aleksandra-kim commented 8 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Add NoIntegerCodesEcospold1Importer. Fixes #30

aleksandra-kim commented 8 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Note that there shouldn't be a problem with integer (as string) codes - we don't believe the integer values in exchanges, just in the list of datasets, which is unique in data produced by ecoinvent and exported by SimaPro.