agilescientific / striplog

Lithology and stratigraphic logs for wells or outcrop.
https://code.agilescientific.com/striplog
Apache License 2.0
204 stars 69 forks source link

Load from CSV #128

Open kwinkunks opened 3 years ago

kwinkunks commented 3 years ago

Needs improving. For example:

mtb-za commented 3 years ago

There are a number of cases that should probably be handled:

  1. Only tops are given - bases are inferred to be the next top.
  2. Only bases are given - tops are inferred to be the next base.
  3. Both bases and tops are given
  4. Either bases or tops are given along with a thickness - the missing value is calculated using the thickness.

Currently we can handle the first and third cases. The second should not be too difficult, and if that is working both of the fourth case become essentially analogous. This probably should happen when we build a list of intervals, rather than being something that the from_csv method handles specially. This will let other from_* methods to do the same.

One major change that I am making is to explicitly require a top, base and/or thickness column to be specified, unless names=True is passed, in which case it should find them automatically. We are still assuming that there is a component column or similar exists, which can be used to define those for the interval.

We still need to think about the possible things in an Interval object, to decide what we are going to give to the Interval constructor:

mtb-za commented 3 years ago

https://gist.github.com/mtb-za/3f94ffc426e804e7b2c778c2f0c6f051 has a couple of approaches, one using np.genfromtxt and one using csv.DictReader. Not sure if one appeals to you more than another one.

If we want to get input as something other than strings using either base approach, we need to cast them. We can start with the most specific type: int, then try float, and finally leave them as str. We might be able to handle other things, but that might be tricky to decide what that needs to be cleanly. genfromtxt allows for a sequence of dtypes, which is probably possible, but more difficult with csv.DictReader.