atari-legend / legacy

Source code for the legacy AtariLegend site (Still used for the CPANEL)
https://legacy.atarilegend.com/
GNU General Public License v3.0
3 stars 0 forks source link

MAIN SITE/CPANEL - ENHANCEMENT - Re-linking publisher data to the release #540

Closed nguillaumin closed 6 years ago

nguillaumin commented 6 years ago

Hi,

I was looking at re-linking the publisher data to the game release table (since a publisher should be associated with a release, not a game). It's a bit tricky unfortunately...

Currently we have the following structure:

Let's take a few examples:

Sqweek:

Captain Blood:

LED Storm

The question is, how do we do this?

Game Release year Publisher
Squeek 1989 ...
Captain Blood 1989 ...
Captain Blood 1990 ...
Captain Blood 1991 ...
... ... ...

I really don't like (1) as it makes us keep the old, "wrong" DB structure.

On (2) and (3), whatever "automatic" way we choose will have errors because we don't have the information in the database to re-link the data properly. So we need to choose the method that will result in the less errors:

Which would be the easiest one to manually fix? Probably (2)?

(4) sounds interesting, but would require users to focus on a large XLS file with thousands of rows on a short period of time (We can't keep it open for too long as changes will still be made in the DB on production and the XLS file will become out of sync...)

Thoughts @stgraveyard @stsilversurfer @stefanjl @Brume-AL @MugUK ?

nguillaumin commented 6 years ago

FYI here is the type of spreadsheet I had in mind: https://docs.google.com/spreadsheets/d/1K0Rav-iqkjRXdFCJPEsBTf0yAnxCABw1G86VX3XjkfY/edit?usp=sharing

We would have to manually "merge" rows that are for the same game, fixing up the date and the publisher. For example these 6 rows for Captain Blood:

blood1

...would have to be manually merged into 3 rows:

blood2

But looking at the spreadsheet it has ~12,000 rows, so that doesn't seem very practical...

nguillaumin commented 6 years ago

Perhaps another approach is to do (2), but then provide a special screen in the CPANEL to allow users to quickly merge releases together (rather than manually having to edit releases and copy/paste publishers or dates).

We would also provider a screen to list all the releases that look wrong (e.g. a game like Captain Blood which would have 3 releases without dates but with publishers, 3 releases with a date but no publishers) in order to easily find which ones need "fixing".

nguillaumin commented 6 years ago

The merge screen could look something like this:

m

When the 2 releases are merged, the result would be:

m2

One could then merge the remaining 5, relatively quickly with just a few clicks...

stefanjl commented 6 years ago

Just a quick reply... about Captain Blood so does it have more releases, check those AKA names they are all seperate releases.

So far so does "2" sound like the best option, but i might have to think a bit more about it :)

Brume-AL commented 6 years ago

The merge screen could be a good solution, but I still have a question: how will you specify a 'name' of a game (or aka name)?

Let's take Galdregon's Domain as an example: This game was published as Galdregon's Domain by Pandora in Europe in 1988, but it was renamed as DeathBringer in USA in 1989. https://atarilegend.com/games/games_detail.php?game_id=1091

Could we take into consideration this case (which isn't alone, there're many games like that)?

Thanks for your help.

stgraveyard commented 6 years ago

Database team, I invite you to look at this : https://github.com/stgraveyard/AtariLegend/pull/538 It will be uploaded to dev soon so you can start testing and give remarks, I hope (@nguillaumin leads! ;-) )

Regarding the AKA. This is what we had analysed regarding the AKA table : 'Since each release can have its own name, this table will not be used anymore'. However, a release can have an optional name. So example, in the case of deathbringer, I think the game name would be death bringer, but the release of 1988 would have the name 'Galregons Domain' (the release of 1989, the title of the game is taken as the release name). So in the end, a release always has its own name. I think this would solve the issue, no? (this file contains all our schema's and analysis info : https://github.com/stgraveyard/AtariLegend/blob/master/Technical%20documentation/AL%20table%20analysis.xlsx)

Regarding the actual question in the thread here, my first idea was 'excell'. This is what I would do : 1) Create an excel which will only list the games with more than 1 publisher or/and 1 release year. This shouldn't be to hard with some quick excel functions and the list of 12000+ rows will be shortened drastically. This way, we have a clear overview of what is to do, the list can be placed online and the highly motivated DB guys (which they are) can divide the work. I personally don't think it will be THAT many games with more publishers or release years 2) We create a temporary 'tool' as you mention above so the DB guys can easily merge and/or delete releases and publishers. This tool should also only show the games which we have in the excel (and it will be a double check to see if the tool is correct and the excel is correct)

While the team is doing this cleanup, the dev guys can work on more features for the game and its releases.

stgraveyard commented 6 years ago

also @nguillaumin , do you have some kind of HTML design page tool? Or how to you make these example graphics so fast? Seems interesting...

nguillaumin commented 6 years ago

Yeah the AKA is already covered under the release "alternative title". This is already in prod actually. But that's another thing we'll need to re-link to the releases, separately. We'll run into the same problem (we can't automate it as we can't tie an AKA to a specific release) so we'll need something similar to what we do here.

I'm curious to see how the Excel pans out if it's only games with multiple releases or publishers! Perhaps it's more manageable.

For mockups I use Pencil as a desktop app, but it's not great. There are a ton of online tools but generally they're not free (but they offer a trial, or a free version where you cannot save).

nguillaumin commented 6 years ago

So I did generate the spreadsheet for games that have more than 1 release or more than 1 publisher. Here it is: https://docs.google.com/spreadsheets/d/1MNuwwh1_Q4fEhr-YYGu7iXf2H5Fw94HGB9tH0hFOZWg/edit?usp=sharing . It has 606 rows which is perhaps more doable?

It took a massive query that ran for 1mn on my computer, I'm putting it there for future reference:

SELECT * FROM
(
    SELECT
        game.game_id,
        game.game_name,
        pub_dev.pub_dev_id,
        REPLACE(REPLACE(pub_dev.pub_dev_name, CHAR(13), ''), CHAR(10), ''),
        ''
    FROM
        game
    LEFT JOIN game_publisher ON game_publisher.game_id = game.game_id
    LEFT JOIN pub_dev ON game_publisher.pub_dev_id = pub_dev.pub_dev_id

    UNION

    SELECT
        game.game_id,
        game.game_name,
        '',
        '',
        YEAR(game_release.date)
    FROM
        game
    LEFT JOIN game_release ON game_release.game_id = game.game_id
) AS T

WHERE game_id IN
(
    SELECT
        game_id
    FROM
        game_release
    GROUP BY
        game_id
    HAVING
        COUNT(game_id) > 1

    UNION

    SELECT
        game_id
    FROM
        game_publisher
    GROUP BY
        game_id
    HAVING
        COUNT(game_id) > 1
)
ORDER BY
    game_name ASC
nguillaumin commented 6 years ago

If we were to go the spreadsheet route it would mean we would make a few assumptions:

  1. If a game has zero releases, then we just create one release for each publisher, with no date
  2. If a game has only 1 release and only 1 publisher, then we merge them together :warning: the risk here is that we may merge stuff that shouldn't be merged, perhaps a new release should have been created. For example a game may have a release in 1987, and the publisher "Infogrames". Does that necessarily mean that it's Infogrames who published it in 1987?
  3. If a game has more than 1 release or more than 1 publisher we use the data manually fixed from the spreadsheet

Case 2. seems to cover 4,074 games in the database.

Sounds like a good compromise, as long as we're happy with the errors introduced by 2., and with manually having to fix the spreadsheet.

Thoughts?

stgraveyard commented 6 years ago

I dont understand how point 2 would create errors? If in the example infograms did not do the 87 release, that means it already is incorrect in our db as it is now. So what is the problem here? Correction woulf be needed either way, no? What do the db guys think of the manual work?

stgraveyard commented 6 years ago

Nico, you know your sql!

nguillaumin commented 6 years ago

Hehe it took me a while to come up with this SQL query!

For point 2, we currently have no relationship between a release date and a publisher, yes? We have game -> publisher on one side, and game -> release date on the other. It means the DB schema allows the possibility of bad data (that's why it's so important to get the schema right). If it's allowed, it will happen, as we see for example with some games that have 3 publisher but only 1 release for example.

For a 1 publisher - 1 release example: someone creates a new game, and say it's been released in 1987. Then someone else comes along later and add the publisher Infogrames. But too bad, the 1987 release was from Ocean, and the Infogrames one was in 1988. If we merge both it means we'll have a single release by Infogrames un 1987, which is incorrect.

You could say that it was already incorrect, but I'm seeing that more as "missing data". We knew there was a publisher on one side, and we knew there was a date, we just don't know how to tie the 2 together.

Is merging in that case making things worse? Should we instead create 2 releases (one in 1987 with no publisher, one with no date by Infogrames)? That's what I'm asking.

Perhaps this case is so rare that it doesn't matter, which is what I was thinking in my previous comment.

I hope that makes more sense? :wink:

stgraveyard commented 6 years ago

To answer your question, no, it doesn't make things worse. I understand what you are saying, but the visitor has no idea about the relationship behind the webpage, so for the well educated Atari freak the data is already wrong. I would say merge and correct afterwards...

nguillaumin commented 6 years ago

Ok!

I have the re-linking of the "simple" case (games with 1 release and 1 publisher) working, and I also updated the release editing screen to add a "release type", "continent" and "publisher" (with an autocomplete) fields. It looks ugly as hell and we definitely need tabs on this, but at least it works:

release

I'm in the process of fixing the main site now.

stgraveyard commented 6 years ago

So motivating! Awesome.

nguillaumin commented 6 years ago

Thanks. I realized the query above is slightly wrong, it will list games that have multiple publishers OR multiple releases, but we want AND...

In any case, I think I will actually make a screen with this list in the CPANEL, so that users can work with that to find releases to merge rather than Excel. It means they can take the time they want to do it.

We would also remove the ability to associate publishers to a game directly on the game editing screen (it has to be on a release), while keeping the data in the DB until all releases have been merged (at which point we can delete game_publisher)

stgraveyard commented 6 years ago

Nico, when do you push this? And when do you think we can add a version to dev? Will you merge with June release? Do you have a plan?

nguillaumin commented 6 years ago

I'm in favour of releasing the June branch now as it is, and the release-attributes one next month

stgraveyard commented 6 years ago

Ok I will do this. Maybe we can divide some tasks so I can jump in? Layout or something like that perhaps?

nguillaumin commented 6 years ago

Sure, I'll try to put it on a branch a bit later.

I just made the screen that lists the game with multiple publishers to merge, and updated the game editing screen to allow easy merging. I recorded a quick video that shows how it works: https://streamable.com/c3gb2 Feedback welcome...

nguillaumin commented 6 years ago

I pushed it on the release-attributes branch. I think the 2 areas that need work are:

blood

We probably need to display that better, display the publisher logo, etc.

nguillaumin commented 6 years ago

Some progress on the releases:

releases

stgraveyard commented 6 years ago

@stefanjl @Brume-AL @MugUK @Marcer75 can you please read this thread, check it and if things are not clear, please give feedback. This will be added somewhere next month to PROD and you guys will have some work to do ;-)

@nguillaumin something that is not really clear to me at first sight from the movie. What is that alternative title 'as published by'? Is that something you added using one of the queries?

stefanjl commented 6 years ago

If a release has an unknown year how will it look in your latest screenshot?

nguillaumin commented 6 years ago

What is that alternative title 'as published by'?

This is game_release.name, the alternative release title. I don't remember where it comes from in the ancient DB schema, but some releases have it. See for example on prod:

al

If a release has an unknown year how will it look in your latest screenshot?

For now it just displays [no date] instead of the year.

nguillaumin commented 6 years ago

nd

We can adjust if necessary...

stgraveyard commented 6 years ago

I will be changing layout and text a bit but ofcourse will ask anybodys approval/permission...

stgraveyard commented 6 years ago

Been adding bits to the layout :

2018-08-09 00_06_58-gods - atari st game _ atari legend

I did not like the title of the release box, so I changed it, keeping. I also added a published pic if available, like with the game info box. And I removed the 'for' when using Enhanced, because it did not look good on screens of 320px wide. I hope I don't get banned for this :-)

nguillaumin commented 6 years ago

It's fine, but keep in mind this box will show multiple releases:

So my advice is to test it with a game that has multiple releases, ideally 3 or 4 to see how it looks.

stgraveyard commented 6 years ago

@nguillaumin I hear what you are saying. I do have something else I like to discuss. Maybe we should start a new issue for this, IF we want to add this to project V1.0.

I do think it is important to talk about the layout of the game detail page on the main site. We have a good idea of how to do it at cpanel (we are using tabs, the only doubt I have is if it will work good enough on smartphone layout). But what about the main site? Do we want tabs here as well? I don't really think it would fit, would it? I mean, how are we gonna do the layout when we have boxes and seperate screenshot for a release? I was thinking of adding the release box at the top of the screen and highlight the currect release (as with CPANEL) and when someone select another release, all the other boxes would change as well if necessary. Would that be a clean way of solving it? Is this something we already wanna prepare for? Also, currently, I do think the cellphone layout (resolution with low pixel count) needs some optimization, as the order the boxes are presented now is not OK in my book. But maybe this is something for a later release? (although I think this is very important). I like to hear other opinions...

nguillaumin commented 6 years ago

I agree we need to discuss this, but probably in a new ticket? Let's keep what we currently have for V1.0? I think it's good enough.

nguillaumin commented 6 years ago

I'm a bit on the fence about putting the publisher logos for each release. It makes the release box very "busy", and makes it harder to see which release is for which year:

blood-simple

vs:

blood

For example it's hard to tell if the "1989" release info is the one that's "above" 1989 (ERE), or below (Infogrames)?

I think it'd be good to have more opinion on this? @stsilversurfer @stefanjl @MugUK ?

Perhaps we can keep them, but find a way to make the dates more "connected" to each release info rows...

stgraveyard commented 6 years ago

I agree with you here Nico. If the others think the same I will gladly remove it ...

stefanjl commented 6 years ago

so now we are talking about how it will look in the game page and not in the cpanel?

nguillaumin commented 6 years ago

For now we'll keep it like the previous screenshot:

Once we start adding more stuff to the releases (release-specific screenshots, boxarts, etc.) we'll need to discuss how we can make it look, because there will likely be too much information to display on a single page, for all the releases. But we'll have this discussion elsewhere, we need to focus on releasing this first.

stgraveyard commented 6 years ago

I completely agree with you Nico. Lets keep it like it was and start a new threat later for the while team to discuss...

stgraveyard commented 6 years ago

@nguillaumin I quickly changed it like you wanted it, I hope ... Release info, Releases ...

nguillaumin commented 6 years ago

Looks good, I think we can close this now.

stgraveyard commented 6 years ago

Close already? Shouldnt we wait till all relinking is done and game_publisher and other tables are removed.

nguillaumin commented 6 years ago

Well everything we can do at this point is done. We can't do more until all data has been merged by the users, which I suspect will take ...years?

But I don't mind if we want to keep it open, it's fine.

stgraveyard commented 6 years ago

Years?? No @nguillaumin , what I mean. You created an extra page containing all games linked to multiple publishers. I thought once everything is merged and the team cleaned this page (linking the release year and publisher to release), than we could remove the game_publisher (and country?) tables and than this whole chapter of publishers and releases (which is the biggest part of project 1) can be totally closed. No? But this will not take years, right?

nguillaumin commented 6 years ago

Well there's a lot of manual work to merge all of this (there are still hundreds of games that need merging), so I'm not sure how fast the users can process all of them? It require a good knowledge of each release / date / publisher to know what to merge with which release...

If I had to do it it would take me years to research which release goes with which date and which publisher, but perhaps contributors know that off hand?

stgraveyard commented 6 years ago

Hmmm...Nico, maybe I'm going cookoo here, but, since the data is already wrong, could we perhaps suggest to give the team a month or 2, and for the things they don't yet know, we merge as is? So we can continue the DB cleanup? Or would this be a bad idea?

nguillaumin commented 6 years ago

We could indeed.

A concern I have with this is that it will result in invalid data being merged (for example Publisher X associated with 1988 and Publisher Y with 1989, where it should have been the other way around). That's not a problem in itself because it's easy to fix, but the problem is that it's impossible to detect the error in the first place because everything has been merged. Currently with the new "publishers to merge" screen we have, there's at least a page that list stuff that needs to get fixed. Once it's merged, we loose that, and contributors won't know which releases were "wrongly" merged, making it hard for them to find error to fix.

Does that make sense?

Since there's no real cost of keeping the game_publisher table (since we removed all references to it in the code, at least in main site), I would be inclined to keep it as long as possible to give an opportunity to users to fix the data, and only remove when it causes us a real issue?

stgraveyard commented 6 years ago

Totally agree...