AB1908 / GOG-Galaxy-Export-Script

Export your list of games from GOG Galaxy
MIT License
149 stars 27 forks source link

Most fields have square brackets and apostrophes around the data #26

Closed toastmonster closed 3 years ago

toastmonster commented 4 years ago

The csv file produced has ['characters'] surrounding most of the fields.

Here is an example that I have transposed for improved formatting.

Header Value
title Alice: Madness Returns
platformList ['Origin', 'Steam']
criticsScore 72
developers ['Spicy Horse Games']
genres ['Adventure', "Hack and slash/Beat 'em up", 'Platform']
publishers ['Electronic Arts']
releaseDate 14/06/2011
themes ['Action', 'Horror']
gameMins 82
AB1908 commented 4 years ago

I see. I'll look into it.

AB1908 commented 4 years ago

It looks like @Varstahl rewrote this to write a dict into the CSV instead of just the relevant data. I suppose he needs that for his tool. I'm thinking of asking him to maintain his own fork and I'll occasionally pull changes from it for improvements. I'd like to use this just as a CSV export tool and not try to deviate too far from that goal.

@Varstahl and @toastmonster, what do you people think?

Varstahl commented 4 years ago

Yes, I rewrote it as it is to avoid problems during CSV creation and parsing. There are a number of problems that made using regular commas unusable, this was the fastest second best thing I could think of in a short time.

If the CSV is parsed via python, you can use literal_eval() function to transform it back into a list. Another option is to convert the list into a different format, which brings its own set of issues, because you need to use another arbitrary delimiter, since commas or other punctuation is not usable.

This is the reasoning that led me to just let the CSV module dump the list and manage all the corrections needed to have correct CSVs.

Edit: it's also to be noted that this system allows for fairly straightforward string parsing even on other languages, which was a plus when I opted for this route.

AB1908 commented 4 years ago

You have a fair point and I see how it can have its benefits but this is trading off usability for usefulness. I'm not sure the target audience wants to fiddle around on the command line that much. I think it would be best if you maintained a fork for advanced use and I maintained one for people who are trying to do a simple export. I had been thinking about adding in a TUI to that end. Would you agree with this idea?

toastmonster commented 4 years ago

CSV files already have an agreed way to deal with commas and other punctuation, and that is to enclose any data that has a comma in quotation marks. This convention is understood by most applications and the formatting is invisibly corrected for the end-user, unlike the method currently being used.

Varstahl commented 4 years ago

CSV files already have an agreed way to deal with commas and other punctuation

Wrong, but by all means, if you find a way to encapsulate Whatever company, ltd. in a list of other companies, do tell me so. The only workaround here is to change delimiter, but that can't be done if a user wants a specified type of delimiter, because if that delimiter is in that list parsing just fails. I changed the system because it was broken to begin with. It works as long as it's a machine creating the CSV to be parsed by another machine, but if the user expects something specific it's going to break.

Also, despite its name, we can't automatically have commas to separate, because that's a regional setting.

I think it would be best if you maintained a fork for advanced use and I maintained one for people who are trying to do a simple export. I had been thinking about adding in a TUI to that end. Would you agree with this idea?

I'm not strongly opinionated about anything, if you find a better way to do it I'm all for it, just be aware and test what happens during parsing where a string like the one above is in the results. The "fix" is literally 2 lines worth to .join() the list, but then other issues crop up.

The only way to fix this in pure CSV is to change the delimiter on the run if it collides with data, but then people would complain that the CSV is not actually comma separated, so… yeah… pick your poison. I chose correct data parsing.

AB1908 commented 4 years ago

I would pick better parsing as well as it's simply not scalable to manually have to go through and edit errors in a list of hundreds of games. I have no qualms about the delimiter that's used and I'm sure modern spreadsheet software can automatically detect the correct delimiter, failing which it can fallback on manual input.

I'm still somewhat concerned about usability so I'll try and create a couple of branches with separate workarounds and see which one works best based on feedback.

CSV files already have an agreed way to deal with commas and other punctuation

Wrong, but by all means, if you find a way to encapsulate Whatever company, ltd. in a list of other companies, do tell me so. The only workaround here is to change delimiter, but that can't be done if a user wants a specified type of delimiter, because if that delimiter is in that list parsing just fails. I changed the system because it was broken to begin with. It works as long as it's a machine creating the CSV to be parsed by another machine, but if the user expects something specific it's going to break.

Could you elaborate on this? From commit 900dc18, the script exports the entry for the Jackbox Party Pack as the following:

The Jackbox Party Pack,Epic Games Store,"Jackbox Games, Inc.","Jackbox Games, Inc.",2014-11-18,"Strategy,Indie,Quiz/Trivia","Comedy,Party",74,0

with Jackbox Games, Inc. being similar to the string you used earlier as an example. This comes out to the following after importing into a Google Sheet with automatic delimiter detection: image

Is there potential for an error? I suppose there may be breakage if an entry comes along with a " in it prior to grouping, such as Company "A", Company B. Is this what you're talking about?

Varstahl commented 4 years ago

I'm sure modern spreadsheet software can automatically detect the correct delimiter

You'd be surprised by the amount of spreadsheets that require manual data import and configuration to open a non localised CSV.

Could you elaborate on this?

Beneath a Steel Sky,GOG,"Revolution Software,Virgin Interactive Entertainment (Europe) Ltd.,Virgin Interactive Entertainment,Inc."

This is one of the instances that made me change it. As far as I know, nested quoting is not supported for comma escaping, so even something like the following

Beneath a Steel Sky,GOG,"Revolution Software,Virgin Interactive Entertainment (Europe) Ltd.,""Virgin Interactive Entertainment, Inc."""

could be misinterpreted.

We could add whatever you chose as a default, and add an option to save it as python lists instead for whoever wants it for mechanical parsing, that's not an issue, I just don't know how to fix this problem for everyone. It's bound to break somewhere for someone. It's not a huge issue, just decide where to draw the line.

AB1908 commented 4 years ago

I think I'm a little stupid. What's wrong with that entry?

Varstahl commented 4 years ago

@AB1908 sorry, the comment got broken while I was typing, I just finished editing :D

AB1908 commented 4 years ago

Okay so that totally makes sense. I guess it would be super useful if we had specs on the character space for the db entries but I guess we'll have to use a generic solution until someone else comes along and reports breakage. Like I said earlier, I'll try out a few different ideas and see where they lead. Thanks for all the clarifications.

Varstahl commented 4 years ago

My stupid idea if you want to follow that route, is to append a list of pre-defined delimiters, like ,;|, to the delimiter specified by the user. Then in here before the comments https://github.com/AB1908/GOG-Galaxy-Export-Script/blob/51e4d1ddea61e07163f88432a196662bd009ca6d/galaxy_library_export.py#L263-L266

we can check if there's a collision between all the list items and the specified delimiter. If there isn't we can use that delimiter for everything, if not we move on to the next delimiter.

Then, inside this condition, after the sort https://github.com/AB1908/GOG-Galaxy-Export-Script/blob/51e4d1ddea61e07163f88432a196662bd009ca6d/galaxy_library_export.py#L342-L343

you can place an if True: and the necessary code to join the lists into strings, so that I might add a condition for the ones who are more interested in the python parsing like myself, without changing much code.

I could do this but at the moment I'm busy at work, yet that's the most optimal way to go around this imho.

AB1908 commented 4 years ago

Ha! That idea is better than what I had. I guess you can take your time and implement it since there's no rush. I'll pick it up if you want me to, however.

Varstahl commented 4 years ago

It's going to take a while, but sure. I can't give you an ETA, though.

AB1908 commented 4 years ago

No problem. I'll try improving usability meanwhile. BTW, would you happen to know how to package python files into executables? It'd be nice if there were a way to remove dependencies and just give the user an executable to work with.

Varstahl commented 4 years ago

I know of py2exe and pyInstaller, have used py2exe once for a test, but I'm not extremely familiar with any of them. If I need to distribute an executable I usually code it in C++ and native Win32 APIs.

If you're going for that you might also take a look at GUIs, it could make the plethora of options less overwhelming.

AB1908 commented 4 years ago

Thanks again.

toastmonster commented 4 years ago

Could always switch to being tab separated instead of comma separated.

AB1908 commented 4 years ago

Somehow, I totally forgot about that. Seems like a good delimiter to use.

Varstahl commented 3 years ago

Finally got around to it, took me quite a while. In the end I did just that, default merging and default TAB delimiter. It's still a pain to import for most spreadsheets but at this point it's the best solution.

Also retained previous functionality through options, so no losses there either.

I just didn't bother to check against multiple delimiters, in the end it's up to the user to decide what to do, there's both the safe "python" option and the bog standard CSV, so it shouldn't be of our concern.

AB1908 commented 3 years ago

Should be good enough. We can always redirect users to report breakages!