PascalLesage / presamples

Package to write, load, manage and verify numerical arrays, called presamples.
BSD 3-Clause "New" or "Revised" License
14 stars 11 forks source link

Collapse matrix data to keep only unique references to matrix cells. #62

Closed PascalLesage closed 4 years ago

PascalLesage commented 4 years ago

Sums samples for given matrix cells. If there are mixed types (technosphere and production, which occurs when there is a loss), technosphere samples are subtracted from production samples and the indices assumes a production exchange. This is only done for individual matrix_data (sample, indices, kind) tuples.

Note: the failing test also fails for master, as discussed here

cmutel commented 4 years ago

It doesn't look to me like the failing tests are related to #61, but rather to changed behaviour now that duplicate elements are summed.

cmutel commented 4 years ago

I'm a bit hesitant about this approach.

  1. The current system of magic numbers requiring/prohibiting a sign flip is brittle, both in general and here specifically. For example, substitution exchanges should be treated the same as production exchanges, but aren't. Should probably import TYPE_DICTIONARY from bw2data.utils, but still need to check for:

BW3 handles this by a) requiring separate arrays for technosphere and biosphere, and b) adding a new boolean field, flip, for data whose sign should be flipped when entering into a matrix.

It seems to me that values should be consolidated for everything regardless of kind field.

  1. Values aren't consolidated across separate arrays, and this means that people still have to think about or at least be conscious of the limitations here. One could make an argument that, given this limitation, data generators should do the consolidation themselves. Maybe a middle ground is to add this function as a utility, but not include it in the default writing of presample packages?
PascalLesage commented 4 years ago

It doesn't look to me like the failing tests are related to #61, but rather to changed behaviour now that duplicate elements are summed.

See https://github.com/PascalLesage/presamples/issues/61#issuecomment-557149249

PascalLesage commented 4 years ago
  1. The current system of magic numbers requiring/prohibiting a sign flip is brittle, both in general and here specifically. For example, substitution exchanges should be treated the same as production exchanges, but aren't. Should probably import TYPE_DICTIONARY from bw2data.utils, but still need to check for:
  • production
  • substitution
  • generic production

generic production not part of the TYPE_DICTIONARY. "Substitution" can easily be added to present code.

BW3 handles this by a) requiring separate arrays for technosphere and biosphere, and b) adding a new boolean field, flip, for data whose sign should be flipped when entering into a matrix.

This will be awesome, but I fail to see what to do with this information for the given problem. Are you suggesting we add this flip boolean in presamples now?

It seems to me that values should be consolidated for everything regardless of kind field.

Everything is consolidated regardless of kind field. However, if kind=='technosphere', then extra care needs to be taken to consider sign flips.

  1. Values aren't consolidated across separate arrays, and this means that people still have to think about or at least be conscious of the limitations here.

This is expected behaviour: data in earlier arrays should always be overwritten by data in later arrays - this is the basis for data refinement.

One could make an argument that, given this limitation, data generators should do the consolidation themselves.

By data generator, you mean whatever upstream piece of code will generate he indices and samples that are then passed to create_presamples_package? This was going to be my initial approach, but the advantage of calling it within create_presamples_package is that the indices are already in an array.

Maybe a middle ground is to add this function as a utility, but not include it in the default writing of presample packages?

Would you agree with a collapse_matrix_indices=False argument in create_presamples_package?
Perhaps we can even issue a warning when this argument was False but there were repeated indices in supplied data.

cmutel commented 4 years ago

Can you push a dummy commit to get the CI to rebuild?

cmutel commented 4 years ago

This will be awesome, but I fail to see what to do with this information for the given problem. Are you suggesting we add this flip boolean in presamples now?

No, just being grumpy.

This is expected behaviour: data in earlier arrays should always be overwritten by data in later arrays - this is the basis for data refinement.

Right, sorry.

Would you agree with a collapse_matrix_indices=False argument in create_presamples_package? Perhaps we can even issue a warning when this argument was False but there were repeated indices in supplied data.

+1. Default should be to consolidate, but can be turned off, with a warning.

PascalLesage commented 4 years ago

OK, will implement as indicated above, and include 'substitution'. Also, two tests now not working (but working on my machine) - these are directly related to creating presamples, though, so obviously new code broke something. I will fix myself. Let me know if you want to review when it is done.