Wrong data in the json source

ryson282 commented 9 years ago

For example, for card 06053, the field count = 53 => Meaning you can add up to 53 copy of that card in a deck.

For this particular example, a coherency check on the count value would do the trick. => If count is above 3, set the value to 3.

However, in the long term, I think a way to correct erroneous data and distribute the corrected data through the API will be needed.

gereons commented 9 years ago

Good catch, pretty obvious it's a typo in the data entry (number:53 and quantity:53). However, Quantity is "how many are in this pack" and not "how many can I have in a deck", that's what maxperdeck is for.

ryson282 commented 9 years ago

Yes, indeed, you are right. But in some deckbuilder, that value is used as the max card per deck you can use. (That is how I spotted it :o).) And usually a multiplicator is applied to the Core set depending on the number of core set you have.

DoubleAitch commented 9 years ago

I understand the plan is to create a Master Datasucker, that will feed down to others in a pyramid structure. The Master will either implement corrections for the CGDB errors or be using a seperate data source that is corrected and maintained in isolation of the CGDB source.

In order to keep track of any errors we find I suggest they are listed here so we can easily correct them all once the Master Datasucker structure is in place.

06053 - "quantity":53 => "quantity":3
05006 - "maxperdeck":3 => "maxperdeck":1
03004 - "maxperdeck":3 => "maxperdeck":1
06020 - "maxperdeck":3 => "maxperdeck":1

gereons commented 9 years ago

Almost all of the card names that have special diacritical marks are wrong as well:

02046 - "title": "Chaos Theory: Wünderkind"
02020 - "title": "Dracō"
05011 - "title": "Shi.Kyū"
01002 - "title": "Déjà Vu"

MarbleMunkey commented 9 years ago

Two thoughts:

1) Would it be helpful to leave it that way and include a second 'display name' attribute?

2) Do we want to consider supporting multiple languages (realizing that we have no current source of i18n data), or do we want to assume that non-English languages would run separate datasuckers? On Sep 29, 2014 7:26 AM, "Gereon Steffens" notifications@github.com wrote:

Almost all of the card names that have special diacritical marks are wrong as well:

02046 - "title": "Chaos Theory: Wünderkind"

02020 - "title": "Dracō"

05011 - "title": "Shi.Kyū"

01002 - "title": "Déjà Vu"

— Reply to this email directly or view it on GitHub https://github.com/datasucker/netrunner-datasucker/issues/8#issuecomment-57147491 .

gereons commented 9 years ago

I would only ever use the "display name" attribute. Objective-C has "case insensitive, ignore diacritics" string comparison/matching built in (and I hope every other modern language makes this easy too).

MarbleMunkey commented 9 years ago

Javascript, in particular, lacks any such niceties.

On Mon, Sep 29, 2014 at 8:39 AM, Gereon Steffens notifications@github.com wrote:

I would only ever use the "display name" attribute. Objective-C has "case insensitive, ignore diacritics" string comparison/matching built in (and I hope every other modern language makes this easy too).

— Reply to this email directly or view it on GitHub https://github.com/datasucker/netrunner-datasucker/issues/8#issuecomment-57154229 .

datasucker commented 9 years ago

i18n is a good thought here. did NRDB's API support different languages?

gereons commented 9 years ago

Yes it did: http://netrunnerdb.ca/api/cards?_locale=de for german cards, although the data isn't complete enough to be useable, IMO.

Note the distinct fields like "faction" and "faction_code": you need to be able to parse this without knowing that "Shaper" is "Gestalter" in german or any other language, so faction_code always has lower-cased english words like "shaper" in every locale. This applies to side, faction, type and subtype.

ryson282 commented 9 years ago

@MarbleMunkey

1/ I think the Master Datasucker should only share corrected data. I do not see the point of sharing erronneous data. However on Master Datasucker, 3 classes could be defined: RawNRCard, Correction and NRCard (corrected) with a daily batch computing the later.

2/ I would advise to only set up an english Datasucker network at first and make it work while thinking the architecture to support multi-lang. Once the network works and is in place, it will be easier to get contributor mastering other language and having access to other language content.

DoubleAitch commented 9 years ago

I have created a separate issue (#10) for discussing Multiple Language Support

datasucker commented 9 years ago

There is now a "master" data source with above corrections to the data. This is only active on 1 datascuker in the network at the moment. More data suckers will be coming online shortly that will clone that "master" DS and provide the initial top-level data suckers to feed the rest of the network.

If you are interested in helping maintain that Datasucker (please help!), visit shapers.cyberdeck.io, register, and request access.

datasucker / netrunner-datasucker

Wrong data in the json source #8