BONSAMURAIS / bonsai

Open source software for product footprinting.
https://bonsai.uno/
BSD 3-Clause "New" or "Revised" License
52 stars 4 forks source link

Text mining #15

Open romainsacchi opened 5 years ago

romainsacchi commented 5 years ago

Estimated person-hours: unknown Volunteer(s)/Candidate(s): unknown Task Description: Text mining of specific data sources, corporate sustainability reports, academic journals in PDF format. Also, for understanding in a more structured way the raw text already contained in LCA data.

On another note, a game-based approach involving a broader community can be considered, involving the manual extraction and parsing of data.

Technical specifications:

Opportunities for machine learning & use of (semi-)automate procedures to replace activities currently requiring human intervention. A practical example could implementing text mining of specific data sources, e.g. corporate sustainability reports.

Data updates and adding new data points: Potential to assign tasks to Master students, group work, classroom projects. One of the flow-property layers has to be defined as the natural unit for each product. The natural unit is the one that the product cannot loose without loosing its meaning.

romainsacchi commented 5 years ago

Can we make that couple of sentences "One of the flow-property layers has to be defined as the natural unit for each product. The natural unit is the one that the product cannot loose without loosing its meaning." clearer?

romainsacchi commented 5 years ago

Python libraries like beautifulsoup4 could be considered, for example. You will find for example here a script I wrote that scraps data from globalenergyobservatory.org and maps all the coal power plants in the world, along with capacity, type of coal used, etc.