TBFY / knowledge-graph-API

This repository is dedicated to the core API of the TheyBuyForYou project built on top of R4R (https://github.com/TBFY/r4r)
2 stars 1 forks source link

Awarded companies #38

Closed cmorenilla closed 4 years ago

cmorenilla commented 4 years ago

NEED We would need to know the number of unique awarded enterprises in the last months (since the Knowledge graph has data), and if it’s posible, to get a the list of them (name and CIF). We need this because we want to buy the data “size of the companies” (nº of employees), just for awarded enterprises. With this, we can identify SMEs and the relation with the type of contracts that they can win with more possibilities. The price depends on the number of enterprises. Ref.

PROPOSAL Pre-condition: Classify companies by number of employees. Example:

Create queries to provide information. Examples:

ocorcho commented 4 years ago

@fyedro Please create the corresponding query

elvesater commented 4 years ago

For some strange reasons this was missing in the original euBusinessGraph ontoloy and thus the corresponding mapping. I've now updated the ontology and the mapping rules. See https://github.com/TBFY/knowledge-graph/blob/master/rml-mappings/opencorporates_mapping.ttl and the predicate ebg:numberOfEmployees for details. This data will be onboarded (if they are available from OpenCorporates) in the next release of the TBFY knowledge graph. Note that I'm currently working on updating all the mapping rules.

cmorenilla commented 4 years ago

Many thanks. so, waiting for next release.

aferrariuliana commented 4 years ago

Hi all, thanks a lot for your answer. Please, regarding the classification of companies by number of employees, we would like to use the official classification of the Spanish Statistical Institute (INE). It would be the following:

aferrariuliana commented 4 years ago

Otra consulta, por favor ¿podemos generar el listado de empresas adjudicatarias en un periodo de tiempo? ¿desde que hay datos en el KG por ejemplo? Es por tener idea de cuantas empresas son, para estimar el coste de comprar el dato de tamaño de empresa. Si compramos toda la bbdd de España nos sale muy caro, por lo que si pagamos solo por empresa saldría más económico. Pero no tengo idea de qué volumen sería. Con el listado, ya puedo a su vez generar un listado “único” de empresas quitando las repetidas y tener el número. Muchas gracia

woodbine commented 4 years ago

Hi Annie,

My spanish isn't great, but if I've understood your question correctly, you're looking for a unique list of companies that won contracts in Spain. The data we have is missing unique identifiers for the data, but we can give you a list of unique company names and addresses. This will include some duplicates, e.g. one entry for "They Buy For You" and another for "TheyBuyForYou", but this is where Open Corporates reconciliation service comes in.

If that is available for up to date company records in Spain, it would be likely that you can greatly reduce the number of suppliers you will need to reconcile.

Best,

Ian

On Wed, 27 Nov 2019 at 12:27, aferrariuliana notifications@github.com wrote:

Otra consulta, por favor ¿podemos generar el listado de empresas adjudicatarias en un periodo de tiempo? ¿desde que hay datos en el KG por ejemplo? Es por tener idea de cuantas empresas son, para estimar el coste de comprar el dato de tamaño de empresa. Si compramos toda la bbdd de España nos sale muy caro, por lo que si pagamos solo por empresa saldría más económico. Pero no tengo idea de qué volumen sería. Con el listado, ya puedo a su vez generar un listado “único” de empresas quitando las repetidas y tener el número. Muchas gracia

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TBFY/knowledge-graph/issues/35?email_source=notifications&email_token=AB6SOOBJHRFUF3TSSLJMLJLQVZRS5A5CNFSM4JKGGFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFJK7QY#issuecomment-559067075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6SOOEVK6X6KGHHMCYDOCLQVZRS5ANCNFSM4JKGGFRQ .

ocorcho commented 4 years ago

@fyedro will generate the SPARQL query on top of the knowledge graph, which should have the reconciliation to companies.

aferrariuliana commented 4 years ago

Hi Ian,

That list of unique company names awarded (and addresses), would be great for us, Thanks a lot.

So, we will work with it in order to reduce duplicated ones, and have a reduced list.

I don't understand clear if the reconciliation service of open corporates can do this? ... now Ia see Oscar's comment.

aferrariuliana commented 4 years ago

Thanks a lot Ian and Oscar. So Ian, at the momento, if you can give us the list in order to have an idea of the volumen, we apreciate.

woodbine commented 4 years ago

Ok. I'll see what I can do.

You need to check with OpenCorporates to make sure that they have up to date Spanish data, otherwise the reconciliation might not be that useful. My understanding is that the last time the data was published openly was around 2015, but I'm not 100% sure.

Best,

Ian

On Wed, 27 Nov 2019 at 13:08, aferrariuliana notifications@github.com wrote:

Thanks a lot Ian and Oscar. So Ian, at the momento, if you can give us the list in order to have an idea of the volumen, we apreciate.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TBFY/knowledge-graph/issues/35?email_source=notifications&email_token=AB6SOOHDX4SP27JACTZLBRLQVZWLNA5CNFSM4JKGGFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFJOFAY#issuecomment-559080067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6SOOG6PYWNK4CPXFDEH63QVZWLNANCNFSM4JKGGFRQ .

aferrariuliana commented 4 years ago
Thanks Ian. At the moment, with having the list of awarded companies (or list of the unique "CIF") in a long period time, we can eliminate duplicates, and we would already have an idea of ​​how many they are and the cost of buying the "number of employees" data. To give you an idea, the cost for 1000 enterprises is 430 €; 5000 enterprises is 900 €. Best regards, Annie número registros coste
1.000 430,00 €
5.000 900,00 €
15.000 1.850,00 €
>50.001 (descuento 10%) 0,09€/registro
fyedro commented 4 years ago

Created a notebook with many of the requests: https://github.com/TBFY/knowledge-graph-API/blob/master/notebooks/Suppliers_v1.ipynb