YourDataStories / ontology

The ontology that is used to describe YourDataStories datasets.
GNU General Public License v2.0
1 stars 6 forks source link

Pilot 3 model #45

Closed Polymathronic closed 8 years ago

Polymathronic commented 8 years ago

We currently have something along the lines of:

:Contract/123 elod:contractId "123";
              elod:documentUrl "http://example.org";
              elod:projectId "456";
              elod:startDate "01-01-2015T00:00:00";

I believe it would make more sense to separate the two, obviously, conceptually different types of entities: contracts and projects. That way:

1) You could describe an instance of a contract using the appropriate contract vocabulary and have it connected to the related project, which would then be described using the common properties relevant for projects. 2) The elod properties would be used as intended (in line with the predefined domains/ranges). 3) The end-user could search for, e.g. public projects (perhaps, with a specific CPV code) and get results from both Ireland and Greece (which is exactly what we want).

@vafopoulos @giorgosvaf Please provide your thoughts if you disagree.

giorgosvaf commented 8 years ago

I agree with Uros suggestion, in fact you could use elod:hasRelatedContract object property and you could specify as domain the project entity and as range the contract entity.

niallob commented 8 years ago

Search by CPV code is exactly what is required. However, journalists cannot be expected to enter a CPV code such as 45233110-3, rather they will search for 'Motorway construction works'.

The platform should present us with a list of CPV descriptions to choose from a list, ('Perhaps a combo box').

Below is a list of the CPV codes we will be using for comparing roads projects across the EU.

45233000-9 - Construction, foundation and surface works for highways, roads 45233100-0 - Construction work for highways, roads 45233110-3 - Motorway construction works 45233120-6 - Road construction works 45233121-3 - Main road construction works 45233122-0 - Ring road construction work 45233123-7 - Secondary road construction work 45233124-4 - Trunk road construction work 45233125-1 - Road junction construction work 45233126-8 - Grade-separated junction construction work 45233127-5 - T-junction construction work 45233128-2 - Roundabout construction work 45233129-9 - Crossroad construction work 45233130-9 - Construction work for highways 45233140-2 - Roadworks 45233150-5 - Traffic-calming works 45233160-8 - Paths and other metalled surfaces 45233200-1 - Various surface works 45233300-2 - Foundation work for highways, roads, streets and footpaths

niallob commented 8 years ago

As to the substantial question. the relationship between contracts and projects is as follows. All projects have at least one contract assigned to them. However, there may be multiple contracts for any given project.

eg Projects 1,2,3 - Contracts A,B,C,D,E. Compound 1A,2B,3C,3D,3E. In this example, projects 1 & 2 have a single contract while project 3 has 3 contracts associated with it.

So long as we have all the contacts listed and each associated with the relevant project I am happy.

Polymathronic commented 8 years ago

CPV codes are linked data resources, and have their corresponding labels attached to them. We don't need to know which codes will be used in the data, nor what the labels are.

The number of projects/contracts doesn't limit the suggested modelling approach in any way.

Polymathronic commented 8 years ago

The project ID and start date were used above only as examples of project-related properties. There is more information that, I think, could/should be used to describe an individual project.

For instance, your price specifications are currently attached to the individual contracts, which is fine. However, the fact that you can have 1:N project-contract relationships implies that, unless you make the (committed and disbursed) amounts of individual projects explicit as well, that information will have to be inferred, which we said we were going to avoid.

That means that a non-expert user exploring the data on the YDS portal won't be able to, e.g. ask for "all public projects in Europe that cost >1M" and get the data from both pilots 1 and 3, as the two models will be, essentially, different. In other words, since both types of projects will be identified merely as elod:PublicProjects in the common graph, @petasis and his team won't be able to differentiate between the two.

Other bits of information relevant for the search scenario that I think should be attached to projects as well are the CPV codes.

I think it would be best if @vafopoulos and @giorgosvaf supported you on modelling your projects the right way, so we bring the two pilots closer together. Ideally, once done, common user queries will automatically fetch data from both graphs (which, I believe, is the main goal of pilot 3).

Polymathronic commented 8 years ago

After taking a closer look at your mapping, it appears to me there is no information about projects in your input data at all. Apparently, you map a field called document_id to elod:projectId, and this looks wrong to me. Also, contract_contract_award_day is mapped to elod:startDate, which also doesn't seem right. These two made me think you have actual project data in there, which is why I suggested to separate the two concepts in the first place.

@vafopoulos and @giorgosvaf will provide you with advice on how to better align with the pilot 1 tender data.

mogaio commented 8 years ago

we splitted them and now elod:hasRelatedContract is used as georgios mentioned before (the project entity as domain and the range is the contract entity)

Polymathronic commented 8 years ago

That is not what I'm saying. Please read the comment again.

giorgosvaf commented 8 years ago

Uros is trying to say that there is not any public project as entity coming from your data , so you have to remove them from the data model.

Uros if i am wrong please comment on this.

Polymathronic commented 8 years ago

Yes George, you are right. The mapping and the original modeling choice were wrong to begin with, which led me to believe there were public projects in there.

This is now also mentioned here: https://github.com/YourDataStories/ontology/issues/48

UPDATE: I will close the issue once you confirm we are on the same page, so there is no additional confusion.

mogaio commented 8 years ago

It's clear now and we update the model according to your comments in the latest version of the rdf data.