dbpedia / mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9 stars 6 forks source link

map {{cite web}}, {{cite book}} etc #66

Open VladimirAlexiev opened 9 years ago

VladimirAlexiev commented 9 years ago

Adding references (sources) to things is all the rage on Wikidata now.

I made a class Reference and a few props for this purpose:

And I'll now map http://mappings.dbpedia.org/index.php?title=Mapping_bg:Цитат_уеб&action=edit, similar to http://mappings.dbpedia.org/index.php?title=Mapping_en:Listen&action=edit (soundRecording -> Sound)

But why no {{cite *}} templates appear for en.dbpedia at http://mappings.dbpedia.org/server/statistics/en/?show=100000? en.wikpedia is chock full of them.

jimregan commented 9 years ago

They don't appear because they're specifically ignored. I'd assume they're ignored because it's difficult to determine if the citation is actually related to the topic of the article, though it's quite possible that they were ignored for technical reasons that are no longer relevant.

VladimirAlexiev commented 9 years ago

Citations specifically relate to facts in the article. From that point of view they ARE related to the article, though the relation may be tenuous. Eg the article Sofia cites a source for its slogan "Grows but does not grow old": that IS an interesting source about Sofia.

Similarly, I've mapped Listen to soundRecording -> Sound. It could be a speech (of Kennedy) or a famous quote (Armstrong's "One small step for man") or a song recording (of a movie or musical work). No matter which, it is relevant.

It's quite difficult to tie up the citations to the specific facts. But I still think providing the citations as structured data can be very useful. Providing sources is all the rage in Wikidata nowadays, and viewed as a precondition to using any FreeBase data.

There's a huge number of citations in WP (I'd assume 3-4x the articles). So enabling these templates will have system issues. But it's worth tackling.

jimkont commented 9 years ago

Hi Vladimir, please remove this mappings, will come back with a detailed explanation when I'm on desktop

Sent from my mobile, please excuse my brevity. On Aug 6, 2015 7:56 PM, "Vladimir Alexiev" notifications@github.com wrote:

Citations specifically relate to facts in the article. From that point of view they ARE related to the article, though the relation may be tenuous. Eg the article Sofia cites a source for its slogan "Grows but does not grow old": that IS an interesting source about Sofia.

Similarly, I've mapped Listen to soundRecording -> Sound. It could be a speech (of Kennedy) or a famous quote (Armstrong's "One small step for man") or a song recording (of a movie or musical work). No matter which, it is relevant.

It's quite difficult to tie up the citations to the specific facts. But I still think providing the citations as structured data can be very useful. Providing sources is all the rage in Wikidata nowadays, and viewed as a precondition to using any FreeBase data.

There's a huge number of citations in WP (I'd assume 3-4x the articles). So enabling these templates will have system issues. But it's worth tackling.

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/mappings-tracker/issues/66#issuecomment-128441471 .

VladimirAlexiev commented 9 years ago

Hi Jim! Цитат_уеб (#70) is excluded at https://github.com/dbpedia/extraction-framework/blob/master/server/src/main/statistics/ignorelist_bg.txt so it doesn't matter if it's defined or removed. This issue remains for longer-term decision: do we want to (and can we) emit WP citations in structured format?

jimkont commented 9 years ago

Looking story short, we definitely should but we should not use the mappings for that, a separate extractor would be more appropriate

Sent from my mobile, please excuse my brevity. On Aug 7, 2015 11:12, "Vladimir Alexiev" notifications@github.com wrote:

Hi Jim! Цитат_уеб (#70 https://github.com/dbpedia/mappings-tracker/issues/70) is excluded at https://github.com/dbpedia/extraction-framework/blob/master/server/src/main/statistics/ignorelist_bg.txt so it doesn't matter if it's defined or removed. This issue remains for longer-term decision: do we want to (and can we) emit WP citations in structured format?

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/mappings-tracker/issues/66#issuecomment-128634711 .

jimkont commented 8 years ago

we now have the citation extractor, active for en from 2015-10 release http://downloads.dbpedia.org/2015-10/core-i18n/en/

VladimirAlexiev commented 8 years ago

Wow (I mean great!), several hundred megs in 2 files! Is there a preview of these files like there was for earlier releases?

jimkont commented 8 years ago

yup, there are preview urls in the download page http://wiki.dbpedia.org/Downloads2015-10

jimkont commented 8 years ago

@VladimirAlexiev we want to make this work for all languages, can you take a shot with bg? this will make it easy for others to follow

VladimirAlexiev commented 8 years ago

Thanks, that preview page is really awesome!

jimkont commented 8 years ago

Cool! the code is here: https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/CitationExtractor.scala

regarding #79 sure that would be nice, the existing code was more like a proof of concept