gaurav / extraction-framework

The software used to extract structured data from Wikipedia
1 stars 0 forks source link

Add dc:format to FileTypeExtractor containing MIME types #11

Closed gaurav closed 10 years ago

gaurav commented 10 years ago

The MIME types could just be hard-coded into the FileTypeExtractor for now.

gaurav commented 10 years ago

Actually, hardcode them into https://github.com/jimkont/extraction-framework/tree/server_test_extraction/core/src/main/scala/org/dbpedia/extraction/config/mappings

gaurav commented 10 years ago

Done in @e1efa66 -- you can see the result at https://raw.githubusercontent.com/gaurav/commons-extraction/master/commonswiki/20140101/commonswiki-20140101-file-information.tql

gaurav commented 10 years ago

This is probably as far as we can get just with WikiPage: anything more sophisticated will require making sense of the file metadata dump. I also need to use the code on line https://github.com/jimkont/extraction-framework/blob/server_test_extraction/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala#L116 to add related classes, so that we end up emitting RDF from:

But that will rely on #10

gaurav commented 10 years ago

This is done in @f512a9d6cd, including relatedClasses.