Edit scraping scripts to comply with our DB structure

metavaults commented 7 years ago

Edit our scraping scripts so they comply with Sequelize/our DB) format and structure.

put process.send("done") at the end
we will have one folder with all input/output files. E.g. "./scrapedoutput/artists/dbpedia_test.json"
all scripts should work fine on their own, produce results and create a .json file so any other script could read it.

I take MusicBrainz

TimHenkelmann commented 7 years ago

done for dbpedia_Classical_musicians_by_instruments, dbpedia_Classical_musicians_by_instruments_and_nationality and dbpedia_Classical_musicians_by_nationality in #56

metavaults commented 7 years ago

done for musicbrainz. Scripts should be loaded in this order:

1) scrapeIDArtists/server.js - will get IDs of all artists composers from the specified era 2) scrapeArtists/server.js - scrapes artists and information about them using the IDs from step 1 3) scrapeReleases/server.js - scrapes releases using the artists' IDs from step 1 4) scrapeWorks/server.js - scrapes works using the artists' IDs from step 1 5) PutJSONTogether/app.js - builds artists table according to sequelize schema 6) PutJSONTogether/worksAPI.js - builds works table according to sequelize schema 7) PutJSONTogether/releasesAPI.js - builds releases table according to sequelize schema

At the end, 3 .json files for the 3 entities (BrainzReleasesSequelize.json, BrainzArtistsSequelize.json and BrainzWorksSequelize.json) will be created. Then, files should be populated to our DB

All scraping scripts should be placed in one folder (that's the initial idea). In the same folder, another folder called "scrapedoutput" should be created. All .json files (for both input and output) will be placed there during scraping.

metavaults commented 7 years ago

Done with all sources

kordianbruck commented 7 years ago

Great! Thanks Lukas

MusicConnectionMachine / StructuredData

Edit scraping scripts to comply with our DB structure #54