Conversion scripts should normalize data filenames

fjuniorr / flowmapper-ci

Bot for running flowmapper

0 stars 1 forks source link

Conversion scripts should normalize data filenames #4

Closed cmutel closed 11 months ago

cmutel commented 11 months ago

Our pattern is database-version-qualifier, so ElementaryExchanges-3.6.xml should be ecoinvent-3.6-biosphere.json.

We don't have a precise version for simapro-flows.json, so this should be simapro-unknown-biosphere.json.

fjuniorr commented 11 months ago

@cmutel agribalyse-3.1.1-biosphere.json, industry-2.0-biosphere.json and simapro-flows.json all have consistent schemas[^1].

Should I merge them into a single simapro-unknown-biosphere.json?

[^1]: data/database-1.json has a context key instead of categories.

cmutel commented 11 months ago

data/database-1.json has a context key instead of categories

This is an artificial distinction - the source data is actually in CSV, without column labels (...), so I added both the context and categories labels.

Should I merge them into a single simapro-unknown-biosphere.json?

In theory, yes, they should be one unified list. Let's at least try this, but keep the separate files as well. It can be helpful to know the statistics for them individually.

fjuniorr commented 11 months ago

I've created a new make rule for data/simapro-all-biosphere.json[^1] using jq but kept the individual flows.

[^1]: Not sure if data/simapro-unknown-biosphere-all.json would follow the pattern database-version-qualifier more closely.

@cmutel let me know if if you want to store in data-raw the original csv files from simapro and add the munging scripts to the project.

cmutel commented 11 months ago

Great, thanks!

Yes, I would go with simapro-unknown-biosphere-all.json, but this is not worth changing 😄

let me know if if you want to store in data-raw the original csv

We can't upload the whole CSV files, both for space and confidentiality reasons.

fjuniorr commented 11 months ago

Yes, I would go with simapro-unknown-biosphere-all.json, but this is not worth changing 😄

For sure it is!

We can't upload the whole CSV files, both for space and confidentiality reasons.

Got it.