agilescientific / striplog

Lithology and stratigraphic logs for wells or outcrop.
https://code.agilescientific.com/striplog
Apache License 2.0
204 stars 69 forks source link

`Component.from_text` not capturing all parts of text #139

Open Zabamund opened 3 years ago

Zabamund commented 3 years ago

This method on Component seems to work fine in some cases but not always, here is an example:

from striplog import Component

sample0 = Component.from_text('Grey fine sandstone.', lexicon)
sample1 = Component.from_text('Light blue marl with interbedded shale with good shows', lexicon)

sample0 yields: image

while sample1 yields: image

kwinkunks commented 3 years ago

It just comes down to the lexicon. The text is parsed in a very naive way, and it's up to the user to compile an appropriate lexicon for their task.

That said, I think the default splitter 'with' should prevent components getting mixed like this. So that is a bug.

The other thing here is that 'marl' is not in the default lexicon, but 'mrl' is (as an abbreviation). If we compile a more comprehensize list for the 'lithology' part of the default lexicon, it's trivial to add it. So that could be an enhancement.