agilescientific / welly

Welly helps with well loading, wireline logs, log quality, data science
https://code.agilescientific.com/welly
Apache License 2.0
311 stars 132 forks source link

`Header` is messy #184

Open kwinkunks opened 2 years ago

kwinkunks commented 2 years ago

The new Pandas approach just build a dataframe from a LAS file -- unless you build the well yourself, in which case it's a dictionary object. The Header class is still there in header.py but I don't think it's used for anything. So it's polymorphic, and confusing.

Key philosophy: welly is not a representation of LAS files. That's what lasio is for.

Proposal: welly should regain its Header object, even if it simply stores everything in a DataFrame. The header object should have good LAS -> Header and Header -> LAS actions, but also be constructable and serializable by other means.

kwinkunks commented 2 years ago

I had another go at fixing this yesterday, and wanted to capture some notes. Dealing with the current well header tricky for a few reasons:

One thing I tried, which I thought would help, was using section and mnemonic as a MultiIndex in the header DataFrame (reasoning: the integer index is totally arbitray and not useful to users), but I couldn't quite get the various bits of machinery working with it. This would at least make the view of the header more useful.

I'd be interested to know what you think about this, @patrick-reinhard. I'm torn on whether the header needs to be useful in v0.5.0, but I think it is something I need to fix soon, because it's going to have consequences for plugging in dlisio.

patrick-reinhard commented 2 years ago

Hi Matt,

Thanks for putting some thought into this. Sharing my thoughts:

  • There's no Header object to take care of the header interface, so you have to know how the header is structured.

I agree there should be interfacing possible for the header attribute of the well. You can now utils.get_header_item() and well.add_header_item(). These should be put into one place. I would propose placing it in header.py and adding wrapper methods on Well.

  • The header is a DataFrame with no useful index, so you have to make complex queries (eg to find records matching a given section and mnemonic)

Agreed, to me it makes sense to set mnemonic and/or section as the index.

  • The header is basically a LAS file header, breaking the model of welly as a format-free representation. (lasio is a LAS file representation, but welly is supposed to be format-independent.)

Header is currently a tabular frame, as I think it should be represented, because that's how you would find it in well reports or well data. A frame is a generic format but I agree it is indeed somewhat tied to LAS format because we copy 1:1 the names of the lasio output.

['original_mnemonic', 'mnemonic', 'unit', 'value', 'descr', 'section']

Those column names to make make sense, and we could parse any other header data (from dlisio) to that same format.

  • The header code is spread out across well.py, utils.py and (maybe? not sure) header.py. The code in utils.py is accessed by curves and locations (I think), that's why it's in there (I think).
  1. All header related code is indeed in well.py, the utils.get_header_item() is in that place because utils.lasio_get() was there before. It makes sense to move it.
  2. header.py is currently not used, imported or called by any of the code. The class, from my view, can be removed. The header related methods could be placed here.

One thing I tried, which I thought would help, was using section and mnemonic as a MultiIndex in the header DataFrame (reasoning: the integer index is totally arbitray and not useful to users), but I couldn't quite get the various bits of machinery working with it. This would at least make the view of the header more useful.

Agreed. If you want to push your branch I can have a look at that multindexing. I think this could be a change we can put in 0.5.1 if we get it right.

I'd be interested to know what you think about this, @patrick-reinhard. I'm torn on whether the header needs to be useful in v0.5.0, but I think it is something I need to fix soon, because it's going to have consequences for plugging in dlisio.

One last thought what I think is missing from the design of Header, it just being a dictionary, is the import/export reproducibility. The Header constructor discards the unit, description and section upon loading, if you want to import a well, do things to it and export, your header will have lost useful information.