geological-survey-of-queensland / industry-report-profile

A profile (domain conceptual model + implementation resources) for Industry Reports recieved by GSQ
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

QDEX Mapping - preliminary review #2

Open KellyVance opened 5 years ago

KellyVance commented 5 years ago
QDEX New Review - VK Review - AI
Report Title Title   Makes sense
Report Type Resources Industry Report Type yes So we're running with an entirely separate vocab for non-resource industry reports? I'm perfectly okay with keeping "Report Type"
Author Name dct:creator yes yes
Lodger Not required yes yes
Submitter System recorded from logged-in user yes yes
Locality dct:Location yes yes
Map References Not required - can be derived using spatial intersect yes yes
Commodity Not required Infer from report content, but is essential info Ideally derived from content, but heavily dependent on the capability of the final product. I'd be fine with it being a manually entered field.
Keywords Not used yes... but there are use cases where keywords are valuable, so this is dependent on what level of detail within the data can be interrogated during search I can see a lot of uses for this where there aren't implicit or inherited connections. This cannot be a manually populated field, and the majority of the important information should be collected explicitly in other fields, but, maybe as a result of machine learning we populate a list of keywords? Is there something fancy that replaces this functionality that I'm not remembering/aware of?
Tenure Queensland Mining Permit yes No. There's official language regarding this that we should tow the line on.
Tectonic Not required - can be derived using spatial intersect yes yes
Stratigraphy Not required - can be derived using spatial intersect No. Not necessarily a spatial intersect. But should be captured in report data templates Not necessarily spatial, at time of report submission they may in fact be defining new strat, or redefining old strat, with historical reports they refer to strat which has in fact been 'moved' since writing.
Age Not required - can be derived using spatial intersect No. but can be linked to via stratigraphy Inherited from strat/basin
Date of Report time:ProperInterval yes yes
Date of Receipt System recorded dct:created yes yes
Project Names Not recorded Probably useful data. Should be derivable from report content Highly relevant for minerals
Mines/Prospect Names Feature of Interest yes? maybe? I guess? We could probably roll up Project Names and Well Names into this tag, but it feels a bit grab-bag-y
Well Names Not recorded link to GSQ Borehole Profile via Spatial intersect Very relevant to non-industry reports, some industry reports
Seismic Survey Names GSQ Survey Profile yes yes
GSQ-AI commented 5 years ago
Report Title Title Review
Report Title Title Makes sense
Report Type Resources Industry Report Type So we're running with an entirely separate vocab for non-resource industry reports? I'm perfectly okay with keeping "Report Type"
Author Name dct:creator yes
Lodger Not required yes
Submitter System recorded from logged-in user yes
Locality dct:Location yes
Map References Not required - can be derived using spatial intersect yes
Commodity Not required Ideally derived from content, but heavily dependent on the capability of the final product. I'd be fine with it being a manually entered field.
Keywords Not used I can see a lot of uses for this where there aren't implicit or inherited connections. This cannot be a manually populated field, and the majority of the important information should be collected explicitly in other fields, but, maybe as a result of machine learning we populate a list of keywords? Is there something fancy that replaces this functionality that I'm not remembering/aware of?
Tenure Queensland Mining Permit No. There's official language regarding this that we should tow the line on.
Tectonic Not required - can be derived using spatial intersect yes
Stratigraphy Not required - can be derived using spatial intersect Not necessarily spatial, at time of report submission they may in fact be defining new strat, or redefining old strat, with historical reports they refer to strat which has in fact been 'moved' since writing.
Age Not required - can be derived using spatial intersect Inherited from strat/basin
Date of Report time:ProperInterval yes
Date of Receipt System recorded dct:created yes
Project Names Not recorded Highly relevant for minerals
Mines/Prospect Names Feature of Interest I guess? We could probably roll up Project Names and Well Names into this tag, but it feels a bit grab-bag-y
Well Names Not recorded Very relevant to non-industry reports, some industry reports
Seismic Survey Names GSQ Survey Profile yes
DavidCrosswellGSQ commented 5 years ago

Noted - will review and reply

dxwell commented 5 years ago

@KellyVance @GSQ-AI @johnkirsten - please review my comments below.

Commodity - Ok to put back into model.

Keywords - We can create a controlled list of keywords, e.g. by starting with the keywords currently populated in QDEX Reports. When I had a quick look at the database records yesterday, there appeared to be a lot of keyword stuffing and some of the keywords were duplicative of the report type. Also, if we capture 'Earth Science Data Categories' on the submission form, this may obviate the need for some keywords. Let's do some better analysis of the QDEX Reports keywords and look at their usefulness. Let's decide on what is suitable as a keyword and what should be core metadata for the report.

Tenure - @GSQ-AI can you please provide the document that defines "There's official language regarding this that we should tow the line on." When I spoke with Jodie Hendey, she said that they tried to bring in "Resource Authority" but it never stuck. She said to use "permit". Years ago, they dropped "tenure" as it implied ownership of the land (at the time of contention regarding CSG and farm land access). To my knowledge, there is no collective term in the legislation, they refer to the specific permits, leases, authorities, etc.

Stratigraphy - so, will this be a vocabulary? or is there already a trusted source for this data (e.g. in DMEGeo?). What is the cardinality between report and stratigraphy? Does there need to be functionality for the user to create new stratigraphy? Or is this controlled?

Project Names - to my understanding, there is no controlled list of project names (the Coal Hub manages their own list of project names). So, will this be free text? How do we get data integrity? Ideally, we would have linkage between projects and their permits.

Mines/Prospect Names - is this then two separate metadata fields? Are these controlled lists? I would have thought so for mines, but new prospect names could be created?

Well Names - is this for P&G only? Will these well names already be in our borehole register? If not, the user would free-text the entry. But I would think we would want to have integrity of the well - entering in the core metadata in our borehole data model. For P&G reports, the report itself will contain the well names.

dxwell commented 5 years ago

@GSQ-AI @KellyVance We need to decide on the geometry that we are going to capture for industry reports. There current form captures both locality and map sheet. When I had a brief look at the data yesterday, the locality was very broad (there was even "Queensland") listed.

For the reports now in QDEX Reports that we will migrate to the new system, how about:

  1. If the report is a permit-based report, we create the geometry based on the permit shape at the date of lodgement of the permit.
  2. If the report is not a permit-based report, then we:
    a. If a shape file has been submitted with the report, then we use that. @johnkirsten can you please check to see if any have been submitted.
    b. Else, we create a shape based on the map sheet(s). @johnkirsten can you please check if there is a 1:1 or 1:* report to map sheet captured.

For new report lodgement, @KellyVance did you already have a plan to capture coverage of the report?

DavidCrosswellGSQ commented 5 years ago

@ajtroup Can you please review this issue - particularly looking at comments:

https://github.com/geological-survey-of-queensland/industry-report-profile/issues/2#issuecomment-529684523

and

https://github.com/geological-survey-of-queensland/industry-report-profile/issues/2#issuecomment-529691616

DavidCrosswellGSQ commented 5 years ago

@ajtroup This is in reference to the QDEX mapping table for the Industry Report Profile - see https://github.com/geological-survey-of-queensland/industry-report-profile

ajtroup commented 5 years ago

Here's some more thoughts to add to the headache...

QDEX New Review AT
Report Title Title Yes
Report Type Resources Industry Report Type Report type is fine, not all the reports are resources industry reports.
Author Name dct:creator See discussion in email regarding whether this should be author or company
Lodger Not required Should be recorded, but doesn't need to be searchable or displayed. Has proven useful where there have been issues noted in the past
Submitter System recorded from logged-in user So Lodger and Submitter are being merged? Currently submitter is the Company, where Lodger is the person physically (digitally) submitting the report
Locality dct:Location yes
Map References Not required - can be derived using spatial intersect Could be a useful QA for the intersect process
Commodity Not required Can be inferred from tenure, but not completely, and depends on the granularity. Inferred from content would be great, but difficult for scanned reports.
Keywords Not used Keywords are definitely used when searching
Tenure Queensland Mining Permit Isn't Queensland Mining Permit only one type of tenure? Or have they tried to make it one type of tenure? Will need translations between EPP, A-P, exploration permit, mining permit, mining tenure, mining lease, EPM, MDL,et al. There are or should already be a vocab we can use adapted from the current classification system.
Tectonic Not required - can be derived using spatial intersect Can be implied from spaitial intersect, but spatial intersect doesn’t deal with depth relationship
Stratigraphy Not required - can be derived using spatial intersect Stratigraphy can't be derived from spatial intersect, but could be gathered from other parts of a report going forward.
Age Not required - can be derived using spatial intersect Can be derived from stratigraphy and should be associated with stratigraphy - keep in mind this could be a very wide range
Date of Report time:ProperInterval yes
Date of Receipt System recorded dct:created yes
Project Names Not recorded Well names important, not sure about seismic name, should be using it for minerals and coal and potentially for some of the CSG fields where naming conventions have shifted over time (e.g. Fairview to FV
Mines/Prospect Names Feature of Interest  
Well Names Not recorded This is the major link point to the well at the moment (is used for the QDEX Data to QDEX reports link to the best of my knowledge. Currently useful as it is as close to searching by UWI as you can get in QDEX Reports, as the title is a string search and suffers from inconsistent naming.
Seismic Survey Names GSQ Survey Profile What is the survey profile?
ajtroup commented 5 years ago

Keywords - The keywords are an opportunity to tag the report with broad content categories that are more granular than the report type. For example, I don’t want to search for all well completion reports, I want to find the ones with core logs, I would use the keywords. Challenge is sorting out how to apply relevant keywords without having to read through > 100,000 reports…

Tenure The reason that terminology didn’t stick is that the project didn’t go ahead (for a variety of reasons). P&G companies definitely still use tenure. Personally not sure if minerals and coal do, but I wouldn’t be surprised. Permit is ok, but you’ll need to be able to translate it for use. Also see ATP, EPP, A-P, ML, PL, MDL, EPC, EPM, … I don’t like the use of ‘mining permit’, as it is restrictive to certain sections of industry as well as really only referring to one (maybe two) types of tenure – the ML and the MDL. Stratigraphy - so, will this be a vocabulary? or is there already a trusted source for this data (e.g. in DMEGeo?). What is the cardinality between report and stratigraphy? @dxwell – what do you mean by this? A report will contain the section of stratigraphy that it has intersected. Stratigraphic units should be as per the ASUD (which I think was being used for the vocab?), but there should probably be some aspect of a company able to propose new units, or report on subdivisions of a unit (for petroleum wells, a particular section of reservoir within a formation, for coal – I’m not sure if the coal seams are listed as official stratigraphic units. So, should be controlled, but with a function for a user to propose new stratigraphy. Project Names No reason why we can’t adopt the coal hub projects. Can see a use for this in grouping coal seam gas projects where well names have changed over time (e.g. Fairview to FV) Well Names - is this for P&G only? Nope. Also strat, with potential for other commodities against sampling reports. They should already be in the borehole register. And yes, the report will contain the well name, but this is the report metadata and the well should be attributed to the report.

KellyVance commented 5 years ago

I was getting a bit lost with the thread. I've tabularised the comments so far and added a few thoughts @DavidCrosswellGSQ

QDEX New Review - VK Review - AI Reply  - DC Review AT Reply - VK
Report Title Title   Makes sense   Yes YES
Report Type Resources Industry Report Type yes So we're running with an entirely separate vocab for non-resource industry reports? I'm perfectly okay with keeping "Report Type" Report type is fine, not all the reports are resources industry reports. YES - but change back to Report Type
Author Name dct:creator yes yes   See discussion in email regarding whether this should be author or company Opinion - Who wrote or compiled Report
Lodger Not required yes yes   Should be recorded, but doesn't need to be searchable or displayed. Has proven useful where there have been issues noted in the past Opinion - Who (person) lodged the report
Submitter System recorded from logged-in user yes yes   So Lodger and Submitter are being merged? Currently submitter is the Company, where Lodger is the person physically (digitally) submitting the report Opinion - Company who submits report
Locality dct:Location yes yes We need to decide on the geometry that we are going to capture for industry reports. There current form captures both locality and map sheet. When I had a brief look at the data yesterday, the locality was very broad (there was even "Queensland") listed. For the reports now in QDEX Reports that we will migrate to the new system, how about: If the report is a permit-based report, we create the geometry based on the permit shape at the date of lodgement of the permit. If the report is not a permit-based report, then we: a. If a shape file has been submitted with the report, then we use that. @johnkirsten can you please check to see if any have been submitted. b. Else, we create a shape based on the map sheet(s). @johnkirsten can you please check if there is a 1:1 or 1:* report to map sheet captured. For new report lodgement, @KellyVance did you already have a plan to capture coverage of the report? yes Reports by industry are always going to be associated with a specific activity or a group of activities on a permit. As far as their location represented at surface i.e. wells - point 2D seismic - set of lines 3D seismic -polygon Tenure based report - Polygon of specific permit at time of report For non-industry reports we should have some kind of polygon or maximum extent of the activity or study being done. For GA/CSIRO/Academic reports they may cross state boundaries. If we have nothing we should do our best to assign either Queensland or Australia... but this should be a final resort. I'm not sure of anything other than the actual mapsheet itself that should reference the Mapsheet extent as its primary spatial representation, it should really just be an intersect.
Map References Not required - can be derived using spatial intersect yes yes   Could be a useful QA for the intersect process See above. Just use an intersect.
Commodity Not required Infer from report content, but is essential info Ideally derived from content, but heavily dependent on the capability of the final product. I'd be fine with it being a manually entered field. Ok to put back into model. Can be inferred from tenure, but not completely, and depends on the granularity. Inferred from content would be great, but difficult for scanned reports. Reintroduce
Keywords Not used yes... but there are use cases where keywords are valuable, so this is dependent on what level of detail within the data can be interrogated during search I can see a lot of uses for this where there aren't implicit or inherited connections. This cannot be a manually populated field, and the majority of the important information should be collected explicitly in other fields, but, maybe as a result of machine learning we populate a list of keywords? Is there something fancy that replaces this functionality that I'm not remembering/aware of? We can create a controlled list of keywords, e.g. by starting with the keywords currently populated in QDEX Reports. When I had a quick look at the database records yesterday, there appeared to be a lot of keyword stuffing and some of the keywords were duplicative of the report type. Also, if we capture 'Earth Science Data Categories' on the submission form, this may obviate the need for some keywords. Let's do some better analysis of the QDEX Reports keywords and look at their usefulness. Let's decide on what is suitable as a keyword and what should be core metadata for the report. Keywords are definitely used when searching The keywords are an opportunity to tag the report with broad content categories that are more granular than the report type. For example, I don’t want to search for all well completion reports, I want to find the ones with core logs, I would use the keywords. Challenge is sorting out how to apply relevant keywords without having to read through > 100,000 reports… Requires more discussion. Keywords may describe occurrences of material NOT captured as the commodity, so there is some use. But searching for reports with core (as an example) should directly look up a register of core rather than via a keyword. And within that register there should be a flag for whether that core has been geologically logged.
Tenure Queensland Mining Permit yes No. There's official language regarding this that we should tow the line on. Can you please provide the document that defines "There's official language regarding this that we should tow the line on." When I spoke with Jodie Hendey, she said that they tried to bring in "Resource Authority" but it never stuck. She said to use "permit". Years ago, they dropped "tenure" as it implied ownership of the land (at the time of contention regarding CSG and farm land access). To my knowledge, there is no collective term in the legislation, they refer to the specific permits, leases, authorities, etc. Isn't Queensland Mining Permit only one type of tenure? Or have they tried to make it one type of tenure? Will need translations between EPP, A-P, exploration permit, mining permit, mining tenure, mining lease, EPM, MDL,et al. There are or should already be a vocab we can use adapted from the current classification system. The reason that terminology didn’t stick is that the project didn’t go ahead (for a variety of reasons). P&G companies definitely still use tenure. Personally not sure if minerals and coal do, but I wouldn’t be surprised. Permit is ok, but you’ll need to be able to translate it for use. Also see ATP, EPP, A-P, ML, PL, MDL, EPC, EPM, … I don’t like the use of ‘mining permit’, as it is restrictive to certain sections of industry as well as really only referring to one (maybe two) types of tenure – the ML and the MDL. Opinion - change to just 'Permit'
Tectonic Not required - can be derived using spatial intersect yes yes   Can be implied from spaitial intersect, but spatial intersect doesn’t deal with depth relationship Spatial x (Age or Strat) should provide Tectonic in most cases
Stratigraphy Not required - can be derived using spatial intersect No. Not necessarily a spatial intersect. But should be captured in report data templates Not necessarily spatial, at time of report submission they may in fact be defining new strat, or redefining old strat, with historical reports they refer to strat which has in fact been 'moved' since writing. So, will this be a vocabulary? or is there already a trusted source for this data (e.g. in DMEGeo?). What is the cardinality between report and stratigraphy? Does there need to be functionality for the user to create new stratigraphy? Or is this controlled? Stratigraphy can't be derived from spatial intersect, but could be gathered from other parts of a report going forward. Stratigraphy - so, will this be a vocabulary? or is there already a trusted source for this data (e.g. in DMEGeo?). A report will contain the section of stratigraphy that it has intersected. Stratigraphic units should be as per the ASUD (which I think was being used for the vocab?), but there should probably be some aspect of a company able to propose new units, or report on subdivisions of a unit (for petroleum wells, a particular section of reservoir within a formation, for coal – I’m not sure if the coal seams are listed as official stratigraphic units. So, should be controlled, but with a function for a user to propose new stratigraphy. Stratigraphy is a one-to-many. One report may have as many stratigraphic units as necessary. I would keep stratigraphy as only ASUD formal units. Another layer down may include reservoir/target/marker units that may be informal or locality/project specific names.
Age Not required - can be derived using spatial intersect No. but can be linked to via stratigraphy Inherited from strat/basin   Can be derived from stratigraphy and should be associated with stratigraphy - keep in mind this could be a very wide range Derivable from strat, geochron etc.
Date of Report time:ProperInterval yes yes   yes YES
Date of Receipt System recorded dct:created yes yes   yes YES
Project Names Not recorded Probably useful data. Should be derivable from report content Highly relevant for minerals to my understanding, there is no controlled list of project names (the Coal Hub manages their own list of project names). So, will this be free text? How do we get data integrity? Ideally, we would have linkage between projects and their permits. Well names important, not sure about seismic name, should be using it for minerals and coal and potentially for some of the CSG fields where naming conventions have shifted over time (e.g. Fairview to FV) No reason why we can’t adopt the coal hub projects. Can see a use for this in grouping coal seam gas projects where well names have changed over time (e.g. Fairview to FV) Not sure how minerals works but I the impression is that it is important information Coal Hub appears to have a list, can we help them vocabularise this list? Petroleum I would derive from something like field/prospect/lead name as that is typically close to the intent here. e.g. Fairview, Arcadia, Spring Gully, Combabula, etc.
Mines/Prospect Names Feature of Interest yes? maybe? I guess? We could probably roll up Project Names and Well Names into this tag, but it feels a bit grab-bag-y is this then two separate metadata fields? Are these controlled lists? I would have thought so for mines, but new prospect names could be created?   By Nick's explanation this is defintiely not a Feature of Interest by our defintion. FOI would be a basin, or rock unit, etc. I'm not sure the intent of this field and what is trying to be described.
Well Names Not recorded link to GSQ Borehole Profile via Spatial intersect Very relevant to non-industry reports, some industry reports is this for P&G only? Will these well names already be in our borehole register? If not, the user would free-text the entry. But I would think we would want to have integrity of the well - entering in the core metadata in our borehole data model. For P&G reports, the report itself will contain the well names. This is the major link point to the well at the moment (is used for the QDEX Data to QDEX reports link to the best of my knowledge. Currently useful as it is as close to searching by UWI as you can get in QDEX Reports, as the title is a string search and suffers from inconsistent naming. Nope. Also strat, with potential for other commodities against sampling reports. They should already be in the borehole register. And yes, the report will contain the well name, but this is the report metadata and the well should be attributed to the report. Reports should be linked should be directly with the well and borehole entities relevant to them. If this IS to be included well names should be selectable from a list and NEVER input as free text. Free texting well names never ends well.
Seismic Survey Names GSQ Survey Profile yes yes   What is the survey profile? See above. If a seismic survey is relevant to a report it should have a direct and explicit link to the actual survey.