geological-survey-of-queensland / borehole-database

Borehole database specification for Geological Survey of Queensland
Creative Commons Attribution 4.0 International
4 stars 0 forks source link

issue dump #1

Open ajtroup opened 5 years ago

ajtroup commented 5 years ago

The way it is described here, I do not think this is the right way to approach this and I am very concerned that there is not enough data captured. Why are we at a stage where this is being defined in the weeks before the solution vendor is being brought on? Surely what we want should have been specified in the tender, otherwise how do we know that the vendor can actually deliver a suitable system. If they’re just proposing to replicate SARIG for Queensland, that’s a backwards step of about 10 years, not a forwards step.

Terminology • While accurate, I have never seen anyone bother to describe a borehole as ‘a narrow shaft bored in[to] the ground. • Water is a liquid • Petroleum covers both oil and gas • There are other synonyms missing for borehole (e.g. corehole)

Background • Actors? • GSQ’s borehole dataset is the point of truth for QLD government the others leverage off it

The borehole register • From what is described overall in this document/page, the borehole register is not replacing MERLIN, it is only replacing a part of the MERLIN boreholes table and not even the minimum amount at that. It describes what amounts to the first two ‘screens’ in MERLIN borehole and has to expand beyond this. Otherwise, the solution is not fit for purpose. • Linked datasets should be those where data comes in an additional defined format – e.g. las or dlis for wireline logs. • Core and cuttings intervals are primary attributes of a borehole • Sample intervals are primary attributes of a borehole that should then be linked to the results from the sampling/analysis

Simplifed data model diagram • Hylogger data is linked data, not result data • Status is entity data (it’s the activity level of the borehole) • Producing stats is probably more engineering data, maybe operational data • Permit that the borehole is drilled on is primary metadata and can then be linked to further data available relating to the tenure. This should be attributed, rather than constantly derived, as you’ll have to take time as well as space into account. • Reports contain data, they aren’t data themselves • Cores and cuttings are intervals, not results • Result includes what (if anything it discovered) • Are intervals engineering intervals? Or production intervals? Or sample intervals? Lithological intervals? Orientation (azimuth and inclination) intervals? Casing intervals? This is really ambiguous • From the general description, it seems that all you’re proposing is that the borehole register contains just the entity data – this isn’t enough. Conceptual data model • Purpose and sub-purpose would map to type and subtype and these don’t have a start and end date. Otherwise there are issues around confidentiality periods when they are defined by activity. The Status should have start and end dates (e.g. producing or not). • Azimuth and inclination are likely to be a log in a petroleum or CSG well. Not sure if minerals and coal are running deviation logs, though I have seen some coal deviation logs. Single field not enough for storing this.

Borehole data elements • Need to have eastings and northings and zone for the location information too • depth_datum would make more sense as origin_datum – much easier to see the association • Azimuth and inclination are likely to be an associated log for petroleum and csg wells • Require rig release date for petroleum wells Borehole data elements that are inferred • Bhf_wireline_logs is not a good record of the wireline logs and should not be simply migrated o Wireline log information must be scraped from las files where possible and composite log scans where no las files are available • Bhf_borehole_survey_plan is not up to date • QDEX Reports number is not currently held in bhf_borehole_survey_plan – it should be in bibliography, but there’s a lot of mess in that space – had to be manual update. Link will need to come from QDEX system • Results field is a pain and is in desperate need of an update. It’s not a sample or observation though. It’s an interpretation of what was found in a well and is not particularly useful as there is an economic implication in the interpretation Other borehole data elements • QWCRN – needed • Rig release date – needed for petroleum wells • 306 have it because it’s more a reflection of how many wells have been hylogged • Total depth logger – needed • Perforation – this is engineering/production information • Comments – probably needs a lot of vigorous discussion and QA

dxwell commented 5 years ago

@ajtroup Thanks for the feedback. Please see the comments below and we can discuss further.

The way it is described here, I do not think this is the right way to approach this and I am very concerned that there is not enough data captured. Why are we at a stage where this is being defined in the weeks before the solution vendor is being brought on? Surely what we want should have been specified in the tender, otherwise how do we know that the vendor can actually deliver a suitable system. If they’re just proposing to replicate SARIG for Queensland, that’s a backwards step of about 10 years, not a forwards step.

This concept of borehole data management has been discussed with, developed by, and supported by the GDMP subject matter experts group for the last 18 months.

The tender documentation and the vendor briefings presented the borehole requirements as they are defined here.

The focus of the design is to capture the primary metadata (the stuff that people use to search with) in a structured database. All other data is still captured in a structured way that can be read, updated, harvested, reported on. Just not using a relational database, instead using modern techniques such as document databases and columnar databases.

Terminology • While accurate, I have never seen anyone bother to describe a borehole as ‘a narrow shaft bored in[to] the ground.

See wikipedia and GeoSciML

• Water is a liquid

What is this comment referring to?

• Petroleum covers both oil and gas

What is this comment referring to?

• There are other synonyms missing for borehole (e.g. corehole)

Can you please provide a reference that mentions corehole? I can only see it as a colloquial term, not something that is defined in a standard.

Background • Actors?

A term used in business analysis and software development: "specifies a role played by a user or any other system that interacts with the subject." See https://en.wikipedia.org/wiki/Actor_(UML)

• GSQ’s borehole dataset is the point of truth for QLD government the others leverage off it

I don't think that is correct. e.g. water bores?
I understand that OGIA source their boreholes from multiple places and then perform QA that never makes it back to MERLIN.

The borehole register • From what is described overall in this document/page, the borehole register is not replacing MERLIN, it is only replacing a part of the MERLIN boreholes table and not even the minimum amount at that. It describes what amounts to the first two ‘screens’ in MERLIN borehole and has to expand beyond this. Otherwise, the solution is not fit for purpose.

See comments elsewhere in this response.

• Linked datasets should be those where data comes in an additional defined format – e.g. las or dlis for wireline logs.

Yes, e.g. a las file is a data resource that can be linked to.

• Core and cuttings intervals are primary attributes of a borehole

These intervals would be in the core and cuttings register as part if the samples database. Need to discuss this given what is captured in the reporting guidelines. This data has not been updated in MERLIN since August 2017.

• Sample intervals are primary attributes of a borehole that should then be linked to the results from the sampling/analysis

See above.

Simplifed data model diagram • Hylogger data is linked data, not result data

In the samples ontology, Hylogging is the observation activity. The results of the hylogging is results.
But, yes it will be recorded as linked data.

• Status is entity data (it’s the activity level of the borehole)

Yes, could be.

• Producing stats is probably more engineering data, maybe operational data

No, I think it is result data because it is the outcome of the well, not the inputs.

• Permit that the borehole is drilled on is primary metadata and can then be linked to further data available relating to the tenure. This should be attributed, rather than constantly derived, as you’ll have to take time as well as space into account.

Yes, agree. The model is intended to convey that, please let me know if it doesn't.

• Reports contain data, they aren’t data themselves

A report is considered a dataset. Where you see data it is typically meant as dataset

• Cores and cuttings are intervals, not results

These are samples, the enduring result of the one-off drilling event.

• Result includes what (if anything it discovered)

MERLIN has a DRILL_RESULT_CODE but nothing is captured in the new reporting guidelines. Let's discuss further.

• Are intervals engineering intervals? Or production intervals? Or sample intervals? Lithological intervals? Orientation (azimuth and inclination) intervals? Casing intervals? This is really ambiguous

Agree - need to disambiguate.

• From the general description, it seems that all you’re proposing is that the borehole register contains just the entity data – this isn’t enough.

The new software design will feature a relational database for the primary metadata, with the remaining metadata and data being stored as key-value pairs. For an example of this, load this borehole data extract into the online tool http://jsoneditoronline.org/.

This method of software design is modern, effective and efficient to both implement and change.

We should not be losing any data - we can capture anything as a key-value pair, but there is no need to replicate the MERLIN database structure or create a complex database system with many Create-Read-Update-Delete forms.

Conceptual data model • Purpose and sub-purpose would map to type and subtype and these don’t have a start and end date.

A start and end date is recorded for purpose and sub-purpose to track any purpose changes over the life of a borehole, e.g. is converted to a water bore.

Otherwise there are issues around confidentiality periods when they are defined by activity.

What issues are there? Aren't the dates based on rig release date for P&G?

The Status should have start and end dates (e.g. producing or not).

Yes, see status_date in Status table in the conceptual model (this would be implemented as start date and end date).

• Azimuth and inclination are likely to be a log in a petroleum or CSG well. Not sure if minerals and coal are running deviation logs, though I have seen some coal deviation logs.

Coal and mineral reporting templates capture both azimuth and inclination as:
Azimuth: The angle (in degrees) of clockwise departure from true north to the drillhole direction.
Inclination: The angle (in degrees) of drillhole deviation away from the vertical. 0 degrees inclination is horizontal and -90 degree inclination is vertical (downward).

Single field not enough for storing this.

I would think anything beyond the azimuth and inclination mentioned above would be in the geometry fields? What other data is there that requires something beyond a single field?

Borehole data elements • Need to have eastings and northings and zone for the location information too

Can do - however, Mineral reporting template does not capture zone (it's the only template that captures eastings and northings). Suggest we use the same input controls as were built for GEM.

• depth_datum would make more sense as origin_datum – much easier to see the association

Agree - does not align to standards, but they all vary anyway.

• Azimuth and inclination are likely to be an associated log for petroleum and csg wells

P&G Well Card has WELL_DESIGN only.

• Require rig release date for petroleum wells

Can add in.

Borehole data elements that are inferred

• Bhf_wireline_logs is not a good record of the wireline logs and should not be simply migrated o Wireline log information must be scraped from las files where possible and composite log scans where no las files are available

What is meant by "not a good record"?

• Bhf_borehole_survey_plan is not up to date

Noted. Last update was 16-Nov-16. Is there another up-to-date system of record?

• QDEX Reports number is not currently held in bhf_borehole_survey_plan – it should be in bibliography, but there’s a lot of mess in that space – had to be manual update. Link will need to come from QDEX system

bhf_borehole_survey_plan was a typo. QDEX comment noted. We can do a compare of data.

• Results field is a pain and is in desperate need of an update. It’s not a sample or observation though. It’s an interpretation of what was found in a well and is not particularly useful as there is an economic implication in the interpretation

This information is not being submitted in new reporting templates

Other borehole data elements

• QWCRN – needed

Will someone search for this metadata?

• Rig release date – needed for petroleum wells

Can add in.

• 306 have it because it’s more a reflection of how many wells have been hylogged

Suprised that is such a low number

• Total depth logger – needed

This is only being recorded for P&G in new reporting guidelines

• Perforation – this is engineering/production information

This information is not being submitted in new reporting templates.

• Comments – probably needs a lot of vigorous discussion and QA

Remarks are being captured in the new reporting templates, so can be captured, but would not be considered primary metadata.

ajtroup commented 4 years ago

Forgive my cynicism – every time there's been an attempt to shift BOREHOLE or discuss updating the database it has fallen over because decisions have been made by people who aren’t familiar with using the data and simply think storing it is all the database needs to do. They’ve also all been unfamiliar with where the sticking points are - I note Liz pointed out some of the quality issues in her comments. So far, the broader stakeholder engagement that should have been done with GSQ staff in relation to developing this has not happened, which makes me really nervous given how close it is to attempting the actual build.

I’m all for a modern database, but make sure you’re bringing people along for the ride and effectively engaging with all stakeholders. There's a lot of consternation with respect to other projects, particularly the MERLIN Data Modernisation and EDC solution projects, in that they are both relying on this solution (and to be honest I'm somewhat shocked that the EDC core storage information isn't part of GDMP, given the physical location of the core is simply an attribute of the core), but there is no engagement.

I'm aware there's been what is now almost two years of discussion in this space. Apparently I'm one of the reporting SMEs, but I haven't really been engaged until the past couple of months. I should also be one of the borehole SMEs or 'senior users'. The question still stands: Why is this discussion happening at the eleventh hour?

Borehole definition Wikipedia is not a suitable reference a. Here’s the oilfields reference https://www.glossary.oilfield.slb.com/Terms/b/borehole.aspx b. Oxford dictionary https://www.lexico.com/en/definition/borehole c. Macquarie dictionary https://www.macquariedictionary.com.au/features/word/search/?search_word_type=Dictionary&word=borehole

Borehole is a compound word where the simplest definition is a 'hole bored (or drilled) into the ground with the purpose of extracting or evaluating a resource'.

A ‘shaft’ in mining parlance is generally a vertical excavation large enough for a person or vehicle (e.g. a lift) to enter or in some situations to provide ventilation to underground workings (specified as a ventilation shaft). The horizontal component is a ‘drive’.

Water is a liquid would have referred to the introduction text that seems to have changed in the last two weeks. I think it originally read ‘… water, liquids…” with liquids referring to (I’m assuming) liquid hydrocarbons

Corehole is used in mining operations to distinguish between boreholes that are chipped vs boreholes that are cored. I’m not going searching through relinquishment reports to find a specific use of this, just understand that it is a term that is used. Just because something isn’t defined in a standard already doesn’t mean that we shouldn’t be using it or that we cannot introduce it as part of our standard.

Point of truth Are we taking on water bores? We have them in our dataset for historical reasons. Have you engaged with the water group about what we will be taking on for what is currently in SPIN? Might be an idea to do that if you haven’t, but it’s very late in the game. GSQ also does not have the expertise or (more importantly) time to be managing a new dataset like that. For most of what OGIA might be doing with the borehole information, the fact it has not be brought back into MERLIN is most likely a reflection that there is nowhere to put it in there. Or that OGIA has never co-operated when they’ve been asked about it.

Primary attributes and searching The broad intervals over which samples have been taken, as well as their sample frequency are primary(e.g. cuttings collected at 5m intervals from surface to 1000m, at 2 m intervals from 1000 m to 1050 m). The core and cuttings register should be capturing the more detailed granularity including result descriptions for each interval (i.e. core description or cuttings description including interval and lithological description/observation) I am getting somewhat frustrated with having to continually point out that the reason the information has not been updated in MERLIN is that we stopped doing it because of time and staffing constraints. The date a particular table was last updated has to do with nothing more but this.
I search on wells that have intersected a particular formation. I search for wells that have a particular analysis result. I have had to search for wells with a particular type of wireline log.

Producing stats I disagree. It’s not results data. It’s post-drilling, but still relates to either the engineering or operation of the well. Ideally it is probably linked data, as there will be a constant/regular update of this information going forward. Hylogger data

It’s linked data because it’s stored in a separate file type. Same way that wireline log data is linked data not results data. You'd have to check with Suraj how many wells have been hylogged. DRILL_RESULT_CODE

It’s captured in the ‘shows’ tab of the petroleum templates. The final result is an interpretation of the shows and is reported as part of well completion reports.

New software Borehole data extract doesn’t parse into the JSON editor – gives the following error Error: Parse error on line 1: … Expecting 'EOF', '}', ',', ']', got '{' There has been no demonstration to date that this hypothetical system is suitable for what is being proposed. I’ve looked, and I can’t find anywhere else that has tried to use this sort of database system to store this kind of data. Can you provide actual examples, not just hypothetical ones? There’s a petroleum guidelines-style set of tables for north-west Queensland that was collated as a test of the templates. Can you demonstrate how that would be used and stored by the proposed solution?

Start and end dates for purpose The purpose of the well doesn’t change over time. The status of the well does. If the purpose and sub-purpose are mapping to type and sub-type, there are additional open-filing issues around this as the timing is tied to both type and rig release date: Exploration and appraisal wells are two years six months from rig release date, development wells are five years six months. The rig release date is the trigger for a report to be due and then the determiner for the open filing, but the time periods are determined by the type. An exploration well doesn’t suddenly become a development well because its status is producing – it’s still an exploration well, the company was just lucky enough to find a resource they can produce in that well. There have already been incidents of companies submitting erroneous types for the wells (whether by mistake or not) that have caused confusion. ‘Converted to a water bore’, ‘producing’, ‘abandoned’, ‘suspended’ etc would all be status_events, and only need to have the start date recorded – the end date becomes the subsequent start date.   Eastings and northings The minerals templates capture zone and datum

Azimuth and inclination Well design isn’t azimuth and inclination. For petroleum wells, well design is the intentional design of the well - are they aiming to drill vertically or deviated? - while the azimuth and inclination data are collected to determine the final geometry of the well. This will also be depth related For petroleum wells (not all, but likely most going forward), the azimuth and inclination of the borehole are collected as part of the wireline log suite. For coal and minerals bores, sometimes they’ll angle the rig at the beginning and drill straight (when doing targeted work). How will this section deal with inclinations with no azimuth? Because that is something that also happens.

BHF_WIRELINE_LOGS “Not a good record” is exactly what it says on the tin. It’s not complete, it’s inaccurate and unreliable. The logs available for categorisation were never kept up to date and are general descriptors for what was done – eg it captures GR (Gamma Ray) logs, but not GRDE (Gamma ray collected along side the density logs). It doesn’t capture contractor specific mnemonics for logs (as they are captured in the wireline log files). Depth intervals are unreliable.

Survey plans You’ll need to engage with Mark Hartland as a stakeholder on this one. Probably also the MERLIN Modernisation project too

Results It should be. It’s an interpretation of the SHOWS table in the template.

QWCRN Yes, people will search for it – it’s also the major link between our borehole records and SPIN.

Total depth logger Still needed – can theoretically be taken out of the wireline log

Perforation It is being submitted in the new templates. There’s a whole tab for it.