MusicConnectionMachine / StructuredData

In this project we will be scanning structured online resources such as DBPedia, Worldcat, MusicBrainz, IMSLP and other databases
GNU General Public License v3.0
4 stars 3 forks source link

Edit scraping scripts to comply with our DB structure #54

Closed metavaults closed 7 years ago

metavaults commented 7 years ago

Edit our scraping scripts so they comply with Sequelize/our DB) format and structure.

I take MusicBrainz

TimHenkelmann commented 7 years ago

done for dbpedia_Classical_musicians_by_instruments, dbpedia_Classical_musicians_by_instruments_and_nationality and dbpedia_Classical_musicians_by_nationality in #56

metavaults commented 7 years ago

done for musicbrainz. Scripts should be loaded in this order:

1) scrapeIDArtists/server.js - will get IDs of all artists composers from the specified era 2) scrapeArtists/server.js - scrapes artists and information about them using the IDs from step 1 3) scrapeReleases/server.js - scrapes releases using the artists' IDs from step 1 4) scrapeWorks/server.js - scrapes works using the artists' IDs from step 1 5) PutJSONTogether/app.js - builds artists table according to sequelize schema 6) PutJSONTogether/worksAPI.js - builds works table according to sequelize schema 7) PutJSONTogether/releasesAPI.js - builds releases table according to sequelize schema

At the end, 3 .json files for the 3 entities (BrainzReleasesSequelize.json, BrainzArtistsSequelize.json and BrainzWorksSequelize.json) will be created. Then, files should be populated to our DB

All scraping scripts should be placed in one folder (that's the initial idea). In the same folder, another folder called "scrapedoutput" should be created. All .json files (for both input and output) will be placed there during scraping.

metavaults commented 7 years ago

Done with all sources

kordianbruck commented 7 years ago

Great! Thanks Lukas