Closed toastmonster closed 3 years ago
I see. I'll look into it.
It looks like @Varstahl rewrote this to write a dict into the CSV instead of just the relevant data. I suppose he needs that for his tool. I'm thinking of asking him to maintain his own fork and I'll occasionally pull changes from it for improvements. I'd like to use this just as a CSV export tool and not try to deviate too far from that goal.
@Varstahl and @toastmonster, what do you people think?
Yes, I rewrote it as it is to avoid problems during CSV creation and parsing. There are a number of problems that made using regular commas unusable, this was the fastest second best thing I could think of in a short time.
If the CSV is parsed via python, you can use literal_eval()
function to transform it back into a list. Another option is to convert the list into a different format, which brings its own set of issues, because you need to use another arbitrary delimiter, since commas or other punctuation is not usable.
This is the reasoning that led me to just let the CSV module dump the list and manage all the corrections needed to have correct CSVs.
Edit: it's also to be noted that this system allows for fairly straightforward string parsing even on other languages, which was a plus when I opted for this route.
You have a fair point and I see how it can have its benefits but this is trading off usability for usefulness. I'm not sure the target audience wants to fiddle around on the command line that much. I think it would be best if you maintained a fork for advanced use and I maintained one for people who are trying to do a simple export. I had been thinking about adding in a TUI to that end. Would you agree with this idea?
CSV files already have an agreed way to deal with commas and other punctuation, and that is to enclose any data that has a comma in quotation marks. This convention is understood by most applications and the formatting is invisibly corrected for the end-user, unlike the method currently being used.
CSV files already have an agreed way to deal with commas and other punctuation
Wrong, but by all means, if you find a way to encapsulate Whatever company, ltd.
in a list of other companies, do tell me so. The only workaround here is to change delimiter, but that can't be done if a user wants a specified type of delimiter, because if that delimiter is in that list parsing just fails. I changed the system because it was broken to begin with. It works as long as it's a machine creating the CSV to be parsed by another machine, but if the user expects something specific it's going to break.
Also, despite its name, we can't automatically have commas to separate, because that's a regional setting.
I think it would be best if you maintained a fork for advanced use and I maintained one for people who are trying to do a simple export. I had been thinking about adding in a TUI to that end. Would you agree with this idea?
I'm not strongly opinionated about anything, if you find a better way to do it I'm all for it, just be aware and test what happens during parsing where a string like the one above is in the results. The "fix" is literally 2 lines worth to .join()
the list, but then other issues crop up.
The only way to fix this in pure CSV is to change the delimiter on the run if it collides with data, but then people would complain that the CSV is not actually comma separated, so… yeah… pick your poison. I chose correct data parsing.
I would pick better parsing as well as it's simply not scalable to manually have to go through and edit errors in a list of hundreds of games. I have no qualms about the delimiter that's used and I'm sure modern spreadsheet software can automatically detect the correct delimiter, failing which it can fallback on manual input.
I'm still somewhat concerned about usability so I'll try and create a couple of branches with separate workarounds and see which one works best based on feedback.
CSV files already have an agreed way to deal with commas and other punctuation
Wrong, but by all means, if you find a way to encapsulate
Whatever company, ltd.
in a list of other companies, do tell me so. The only workaround here is to change delimiter, but that can't be done if a user wants a specified type of delimiter, because if that delimiter is in that list parsing just fails. I changed the system because it was broken to begin with. It works as long as it's a machine creating the CSV to be parsed by another machine, but if the user expects something specific it's going to break.
Could you elaborate on this? From commit 900dc18, the script exports the entry for the Jackbox Party Pack as the following:
The Jackbox Party Pack,Epic Games Store,"Jackbox Games, Inc.","Jackbox Games, Inc.",2014-11-18,"Strategy,Indie,Quiz/Trivia","Comedy,Party",74,0
with Jackbox Games, Inc.
being similar to the string you used earlier as an example. This comes out to the following after importing into a Google Sheet with automatic delimiter detection:
Is there potential for an error? I suppose there may be breakage if an entry comes along with a "
in it prior to grouping, such as Company "A", Company B
. Is this what you're talking about?
I'm sure modern spreadsheet software can automatically detect the correct delimiter
You'd be surprised by the amount of spreadsheets that require manual data import and configuration to open a non localised CSV.
Could you elaborate on this?
Beneath a Steel Sky,GOG,"Revolution Software,Virgin Interactive Entertainment (Europe) Ltd.,Virgin Interactive Entertainment,Inc."
This is one of the instances that made me change it. As far as I know, nested quoting is not supported for comma escaping, so even something like the following
Beneath a Steel Sky,GOG,"Revolution Software,Virgin Interactive Entertainment (Europe) Ltd.,""Virgin Interactive Entertainment, Inc."""
could be misinterpreted.
We could add whatever you chose as a default, and add an option to save it as python lists instead for whoever wants it for mechanical parsing, that's not an issue, I just don't know how to fix this problem for everyone. It's bound to break somewhere for someone. It's not a huge issue, just decide where to draw the line.
I think I'm a little stupid. What's wrong with that entry?
@AB1908 sorry, the comment got broken while I was typing, I just finished editing :D
Okay so that totally makes sense. I guess it would be super useful if we had specs on the character space for the db entries but I guess we'll have to use a generic solution until someone else comes along and reports breakage. Like I said earlier, I'll try out a few different ideas and see where they lead. Thanks for all the clarifications.
My stupid idea if you want to follow that route, is to append a list of pre-defined delimiters, like ,;|
, to the delimiter specified by the user. Then in here before the comments https://github.com/AB1908/GOG-Galaxy-Export-Script/blob/51e4d1ddea61e07163f88432a196662bd009ca6d/galaxy_library_export.py#L263-L266
we can check if there's a collision between all the list items and the specified delimiter. If there isn't we can use that delimiter for everything, if not we move on to the next delimiter.
Then, inside this condition, after the sort https://github.com/AB1908/GOG-Galaxy-Export-Script/blob/51e4d1ddea61e07163f88432a196662bd009ca6d/galaxy_library_export.py#L342-L343
you can place an if True:
and the necessary code to join the lists into strings, so that I might add a condition for the ones who are more interested in the python parsing like myself, without changing much code.
I could do this but at the moment I'm busy at work, yet that's the most optimal way to go around this imho.
Ha! That idea is better than what I had. I guess you can take your time and implement it since there's no rush. I'll pick it up if you want me to, however.
It's going to take a while, but sure. I can't give you an ETA, though.
No problem. I'll try improving usability meanwhile. BTW, would you happen to know how to package python files into executables? It'd be nice if there were a way to remove dependencies and just give the user an executable to work with.
I know of py2exe and pyInstaller, have used py2exe once for a test, but I'm not extremely familiar with any of them. If I need to distribute an executable I usually code it in C++ and native Win32 APIs.
If you're going for that you might also take a look at GUIs, it could make the plethora of options less overwhelming.
Thanks again.
Could always switch to being tab separated instead of comma separated.
Somehow, I totally forgot about that. Seems like a good delimiter to use.
Finally got around to it, took me quite a while. In the end I did just that, default merging and default TAB delimiter. It's still a pain to import for most spreadsheets but at this point it's the best solution.
Also retained previous functionality through options, so no losses there either.
I just didn't bother to check against multiple delimiters, in the end it's up to the user to decide what to do, there's both the safe "python" option and the bog standard CSV, so it shouldn't be of our concern.
Should be good enough. We can always redirect users to report breakages!
The csv file produced has ['characters'] surrounding most of the fields.
Here is an example that I have transposed for improved formatting.