DigitalCommons / open-data

0 stars 0 forks source link

Publish ICA Youth Network Data as LOD #11

Closed ColmMassey closed 4 years ago

ColmMassey commented 4 years ago

Create required scripts to process and publish a snapshot of the ICA Youth Network's member data on our dev server.

See: here for background.

Use project name ica-youth-network

ColmMassey commented 4 years ago

We need to define how to map 2 field in Youth data with our schema.

For this iteration, let's just to Organisational Structure See https://vocabs.solidarityeconomy.coop/essglobal/V2a/html-content/essglobal.html#V2a

We can make the following straight forward mapping from Type to Organisational Structure

    Organisational Structure  
Cooperativa de consumo / usuario final -> Consumer co-operative OS80
Coopérative de consommateur.rice.s      
Final consumer/user cooperative      
       
Cooperativa de múltiples actores -> Multi-stakeholder co-operative OS100
Coopérative pluri-acteurs      
Multi-stakeholder cooperative      
       
Cooperativa de producción -> Producer co-operative OS90
Coopérative de producteur.rice.s (dont agricole)      
Producer cooperative      
       
Cooperativa de trabajo y empleo -> Self Employed OS150
Cooperativa di lavoro      
Work and employment cooperative      
       
Coopérative de travailleur.se.s -> Workers co-operative OS60
sunnydean commented 4 years ago

Hi @ColmMassey Initial version deployed at: https://data1.solidarityeconomy.coop/ica-youth-network/

they're all only coops for now (organisational structure) We need to finish the mapping and implement it

sunnydean commented 4 years ago

Btw @ColmMassey How did you extract the youth data for the newest file in the next cloud the csv file is really messy (i.e. random quotation marks in places, missing field, names in lat/lng fields)

sunnydean commented 4 years ago

it has some generic problems with the rows as well

i.e.

these pretty much don't seem to fit, there are two random commas inserted

Organization Type,Name,name,Region,Country,City,Latitude,Longitude,Size,Type,Sector,Address,Description,Additional Details,Website,Email,,

Youth-led Co-ops,Bukavu Youth Agripreneurs (BYA),Agriculture,Africa,Democratic Republic of Congo,Bukavu,-2.5123017,28.8480284,0-5,Producer cooperative,Agriculture,"Av. Du Plateau No45 A, 3�me niveau", Q. Nguba," C. Ibanda/Bukavu-South Kivu, Democratic Republic of Congo (DRC)","AgriTech is an organization whose central objective is to respond, in quantity and quality, to the needs of rural and urban people in food production, for sustainable improvement to their health and financial situation. AgriTech provides training to youth in IT and agriculture because we believe that youth are the next generation to make this world a safer and better place to live.",,www.agritech.online,info@agritech.online

or this line

Youth-led Co-ops,Campus Credit Multi-Purpose Cooperative Society,Banking / credit unions,Africa,Nigeria,Abraka,5.7894321,6.1023468,6-20,Final consumer/user cooperative,Banking / credit unions,"Post Graduate Class (Campus 1) Institute of Education, Delta State University, P .M. B 1, Abraka Delta State, Nigeria","CampusCredit Cooperative Society is a student-owned/driven consumer cooperative which originated from an idea by a group of post-graduate students in the Institute of Education, Delta State University, Abraka. The goal is to engineer trade systems on campuses that will enhance students financial well-being which in turn improves students academic performance.",Our strategy is simple: Harness students purchasing power through benefits associated with economies of scale", drive financial inclusion through promotion of the cooperative business enterprise," and enhance financial literacy amongst students through research-driven fInancial lIteracy counseling programs.,www.campuscredit.coop,team@campuscredit.coop

with a bunch of random quotation marks in the middle

sunnydean commented 4 years ago

Should we fix this here or create a new issue? The old file (the one uploaded 21 days ago) does not seem to have these issues, i have currently uploaded that one as LOD

we can either clean the data or just throw away bad entries

OR we can allow the LOD data to have small errors (e.g. some string in the lat lng fields) and make sure we account for that

sunnydean commented 4 years ago

updated with the new data and did the mapping for organisational structure at: https://dev.ica-youth-network.solidarityeconomy.coop/

I am doing 'some' cleaning on the data. Basically I am removing the first and last " symbols that surround each row then I am removing each random " in between each "" which is the actual delimiter then some other ' symbols

I also have some code for fixing the errors above and placing the " in the right places, but that messes up the good entries. Data as it is is alright and errors raised from bad fields are accounted for and negated.

Would you like me to clean up the data fully or just leave it as it is?

ColmMassey commented 4 years ago

This is the url to what seems to be the raw data? https://docs.google.com/spreadsheets/d/e/2PACX-1vRBT9x3W7Cw-7EEZczfTExYNrrO6yFfe7drhXiHTsRkSg7q2TR3r902ybpcOikqZ5-YCqz2T04wo4qU/pub?gid=573145819&single=true&output=csv

ColmMassey commented 4 years ago

Would you like me to clean up the data fully or just leave it as it is?

We shouldn't be doing any cleaning that can't be automatic when loading new versions, but better to get cleaner raw data. Is it cleaner when you pull from thsi google doc? There are two data sets, but let's just work on this one now, the youth led co-ops.

ColmMassey commented 4 years ago

it has some generic problems with the rows as well

I don't know what the issue was with the first data I downloaded, but the stuff I get now using that url looks clean. Do you concur @dtmakm27 ?

sunnydean commented 4 years ago

It is much better than the last, thanks for the fast response (saved me the effort of having to clean the previous one :) )

It also has minor problems but these will be accounted for when automatically generating data (i.e. additional commas which mess up the fields and random text in email/website fields). It should be alright if we receive data in this state (the link you provided)

sunnydean commented 4 years ago

New data is published: https://data1.solidarityeconomy.coop/ica-youth-network/index.html Map is also updated with the new data: https://dev.ica-youth-network.solidarityeconomy.coop/

sunnydean commented 4 years ago

The only thing left I think is that the address is one big string and is not separated in different segments. Should we do that or leave it as it is? @ColmMassey

ColmMassey commented 4 years ago

The only thing left I think is that the address is one big string and is not separated in different segments. Should we do that or leave it as it is? @ColmMassey

Let's leave the address as is for now. How is it decided what fields are listed in the dialog? For example in Oxford, the description is listed, but not in the ICA Youth data?

sunnydean commented 4 years ago

Ah, That is the sparql query The email is not showing either. Be right back I'll do it now

ColmMassey commented 4 years ago

Note, I've created an Issue for generating a Sameas list linking dotcoop & Yout ICA. https://github.com/SolidarityEconomyAssociation/open-data/issues/15

sunnydean commented 4 years ago

https://dev.ica-youth-network.solidarityeconomy.coop/

added email and description note: in the original csv file we have a field called description and a field called additional description. I am just appending additional description to description and putting them into one field

ColmMassey commented 4 years ago

added email and description Let's leave out email for now as several of them are individuals. We would need permission to publish.

just appending additional description to description and putting them into one field

Makes sense.

sunnydean commented 4 years ago

Do you want to add more todo here or is it ready for review?

ColmMassey commented 4 years ago

Once the email is dropped, put in For Review.

sunnydean commented 4 years ago

Ah wait, so I should remove the email?

sunnydean commented 4 years ago

dropped the email https://dev.ica-youth-network.solidarityeconomy.coop/

wu-lee commented 4 years ago

@ColmMassey @dtmakm27

I notice the ICA youth data has two "name" fields, presumably a mistake because the second looks like some other sort of data.

The first line of the data, for example, makes me think it is a duplicate of "Type". As such I think we can ignore it, but it might be worth mentioning to ICA so they can correct it (and add any other field they may have intended.) Also, see my next comment.

wu-lee commented 4 years ago

Another point: identifiers are missing from this data.

We get by in the demo by inserting our own. Ours are just an incrementing integer, added to initiatives in the order they are seen in this file. However, that won't work in general, and trying to track our own index of IDs would be a headache which is probably entirely avoidable.

Can we ask them to provide the member ID or some other unique identifier?

ColmMassey commented 4 years ago

Can we ask them to provide the member ID or some other unique identifier?

I have made enquiries.

ColmMassey commented 4 years ago

There is someone currenlty cleaning up all the Youth Co-op data to be aligned with the regular ICA data format, so let's leave this Issues closed until they come back with the new format in July.