RADutchie / SA-exploration-topic-modelling

4 stars 2 forks source link

Stratigraphic Modelling #3

Open RichardScottOZ opened 3 years ago

RichardScottOZ commented 3 years ago

As mentioned on linkedin

Can some sort of feature ensemble lead to this

NLP Geochemistry Lithology logs (and associated)

Depending on which you have

RADutchie commented 3 years ago

I think so but I guess it depends on what you're trying to do. I have been thinking about training a NER model to include stratigraphic names as entities to be able to efficiently discover them in text and then extract related contextual info, this would be a great way to parse large volumes of text to find references to stratigraphic units and their descriptions etc.

If you wanted to try and predict strat from other data, say down hole data, I think we need to first generate a training set of data modalities that describes the scope of the variation within each unit and what is potentially representative. I've been thinking about this from a survey perspective to try and automate the generation of this sort of data set, but not there yet. MinEx CRC has also now got an opportunity fund project looking into this idea as well.

RichardScottOZ commented 3 years ago

Yes, and if you have the first, then the simple encoded 1-0 is possible, but speaking of contextual - can you numerically vectorise how 'unity' a particular area is from text. Then it is another ML feature that way.

I don't know if you have government access to things like Elsevier etc. and text APIs for grabbing stuff from relevant journals/abstracts a la Jose Padarian's soil work?

RichardScottOZ commented 3 years ago

There's a certain amount of open data available via the core api:- core.ac.uk - I started tinkering with some deposit style type info from there but it is probably a bit ad hoc as it turned out - but still test data.

RichardScottOZ commented 3 years ago

Is there any detail on the MinEx CRC work?

RADutchie commented 3 years ago

Nothing yet on the MinEx CRC, but I am involved in the project so will hopefully learn more soon. It's a Mark Jessel project. I do have access to journals etc through my Uni Adelaide affiliate status but I haven't looked into accessing training data that way yet. I did read Jose's paper on mapping bore hole descriptions and that certainly got me thinking along these lines.

I think my biggest challenge is to try and create a suitable annotated training set, i.e. finding the time. One plus is I currently have a project going compiling the 'best' available descriptions and info on SA's strat units to produce a digital explan notes system. I think this could be used as a starting point.

RichardScottOZ commented 3 years ago

Yeah, when I asked UniSA similarly, it was 'jump through bunches of hoops, maybe get approval'. As opposed to these DEM people know these State Library admins etc.

RichardScottOZ commented 3 years ago

Some base strat unit definitions would be useful, certainly - having pulled the countrywide stuff into a database last year.

RichardScottOZ commented 3 years ago

The old eartharxiv had an API, too :- https://eartharxiv.org/repository/list/

Haven't looked into the new version.

RichardScottOZ commented 3 years ago

This I have only ever done a simple test of:-

https://geodeepdive.org/

A base SA stratigraphy test set run might be an interesting use though.

RichardScottOZ commented 3 years ago

Time was an important element for the palaeo version, too:-

http://deepdive.stanford.edu/paleo

RichardScottOZ commented 3 years ago

https://github.com/UW-Macrostrat/Global-Geologic-Units