Open romatthe opened 9 months ago
After thinking about this for a few more minutes, a relatively easy solution that would weed out some of the most glaring issues (but still not entirely circumvent some of the problems) is the following:
When fetching achievement details for a specific appid
, change the query from
https://store.steampowered.com/api/appdetails/?filters=basic,achievements&appids=<APPID>
to
https://store.steampowered.com/api/appdetails/?filters=achievements,release_date&appids=<APPID>
This gives you an additional piece of data release_date
that has a field coming_soon
that's set to true
in case the game hasn't been released yet. (It also filters out the "basic" info to decrease the size of the payload since I don't think any of that info is used anyway, right?).
Example, Paradise Killer (1160220):
{
"1160220":{
"success":true,
"data":{
"achievements":{
"total":39,
"highlighted":[
...
]
},
"release_date":{
"coming_soon":false,
"date":"4 Sep, 2020"
}
}
}
}
Example, Black Myth: Wukong (2358720):
{
"2358720":{
"success":true,
"data":{
"release_date":{
"coming_soon":true,
"date":"20 Aug, 2024"
}
}
}
}
So, what you could do is, whenever you are going through the entire list of appids, when you then query the Storefront API for a specific appid, and you retrieve a game that has its coming_soon
flag set to true
, simply discard it and do NOT add it to app_list.json
. That means that the games that DO get added to app_list.json
with "achievements: null"
are games that, at the very least, are released and do not have achievements.
As soon as the games then get released, they get included in app_list.json
in the next dump, and you get a more definitive statement on whether or not they actually have achievements.
This still has three issues:
Despite those issues, it would at least stop putting games into the database that haven't been released yet, which is probably 98% of the games effected by this entire issue I'd wager.
(Also maybe just filter out appids that have their type
field not set to game
? Doesn't that just increase the size of the JSON file stored in the git repo for no reason?)
Hopefully I'm still making sense here. Sorry for bothering you with such long posts, I'm not good at being concise.
Hello,
What an impressive post! I will try to answer all of your questions and lay down my thoughts here, if I'm not clear enough or forgot something, please ask me again. You're not bothering, quite the opposite!
The process you described is wrong. Here is how SamRewritten uses the list:
SamRewritten only uses the Steamworks SDK, and not the web api. Therefore, it is not able to fetch additional info about games itself, without impersonating an app.
You pointed out a great flaw in the refreshing process of the list. The already processed apps by SteamAppList are not processed again. That means the app changes are never registered.
I implemented the list this way since it takes a lot of time to refresh it, and I did not have in mind to automate it one day. Thinking of it now, a better solution would be to refresh all the apps in a scheduled time, probably using github scheduled actions, like every week. That way the problem is solved and human intervention is not needed. If you're looking to contribute maybe that would be a cool little project, to fiddle with SteamAppList. That way the list would be updated fully every week.
What you say about the app sorting and required information to collect about them makes a lot of sense. But again, unfortunately I do not have more time to spend on this project anymore.. unfortunately. However, if you want to contribute and need guidance, I will be very happy to help!
I already merged your PR in SteamApplistDump, I'm not sure what I can do more with this issue but I will leave it up for discussion.
Let me know what you think, have a good day :)
Hey, thanks for the response. Sorry again for the lack of brevity.
The process you described is wrong. Here is how SamRewritten uses the list:
- SAM downloads it from github
- On every appId encountered in the list, SamRewritten checks if the app is owned by the current user
- If the app is owned, the app is added to the main menu view, using only data from the list.
Totally my mistake for not being clear, what I meant was the process of GENERATING the initial list via your SteamAppList
projects, aka the JS script. Not the process of how SamRewritten uses this list.
I totally understand you don't have a lot of time anymore, like you announced previously. If I find the time, I could have a try over the weekend to look into making some changes to the script that takes the dump as well as looking into the automation of executing said script.
I think your suggestion of looking into Github actions is certainly a really good one. I have some experience with it, but not a lot. From what I recall, you could schedule a workflow with a cron-style expression. I'd need to have a look at what the potential limitation are for very long-running tasks of course.
Feel free to leave this issue open, and I'll report if I was able to achieve something. It might take a while though.
P.S.: I also indicated in your post mentioning your lack of time that I'd personally be interested in attempting a rewrite of the core functionality of SamRewritten
. I can't guarantee anything, since I don't have a lot of time and suffer from a chronic lack of energy, but I do hope it might be something I can work towards in the future.
You don't need to apologize!
Indeed I misunderstood what you meant, so yes you're totally right, this is totally how the SteamAppList project works.
Even though I'm very busy with other things, I'm very open to contributions, so please feel free to experiment new things, you can even attempt to rewrite the SteamAppList from scratch altogether if you wish. As far as I remember this wasn't the most elegant code.
That's exactly what I was thinking about for the Cron-style github action, glad we have the same vision! I will leave the thread up for discussion and more.
To answer your PS: If you attempt to rewrite the core functionnality of SamRewritten, don't make the same mistake than I did: clearly separate the server to the client, and defining the process architecture is key. How will you implement your process pool? Are you going to be using threading? That kind of stuff.
Small update (and note to myself for future reference):
I did a very quick session of fiddling with the script over the weekend. The result is as follows:
I was able to very quickly make some adjustments that allowed you to take a dump from scratch if you set the correct environmental variable, while also discarding titles that haven't been released yet (so we can grab the achievement details during a later dump. The big downside is of course that it takes several lightyears to perform a complete dump. It's still running right now, but I think it will probably end up taking somewhere around 100 hours.
The second thing I did was to explore the use of the IStoreService
web API for collecting all known appids, more specifically the https://api.steampowered.com/IStoreService/GetAppList/v1/
endpoint. This one is actually (semi) documented, and the following benefits:
There are some downsides as well:
Using the new endpoint significantly cuts down on the time required to perform a complete dump, because you're no longer querying the details of appids not associated with games. I think this full dump takes about 40 hours. I have already performed one with a quick POC, but I have yet to validate the results.
Interesting findings! I had no idea about IStoreService! Maybe it's a new thing? Either way, don't worry about the web API key: if we turn it into an automated github action, I don't think it's going to be necessary to change it often.
Although I don't contribute to SamRewritten I can review or do some parts on this if you need help. Looks like you're already on your way though!
I've also been working on my own fork of SteamAppsList.
One note I make is that we do not want to take a dump from scratch. Per PaulCombal/SteamAppsList#1 the script already has trouble with games that have been removed, so we don't want to lose any games by deleting the whole dump. A rough way of doing this is to update games without previous achievements. This still misses games that had achievements list updated later in life (e.g. Hardspace Shipbreaker), but it doesn't cause any issues. This also has the problem that it catches a lot of games that weren't updated, so it's still a band-aid until the POC API fork works.
PaulCombal/SteamAppsList#5 also notes another issue with the long retrieval times. The end goal would be to check if a game has updated, like how SteamDB does it (or presumably the POC API fork). However, a band-aid is to just save the dump more often. While this doesn't help the time it takes, it at least makes it more bearable for a human to run this script (or to recover a failed github action).
I've implemented both of these (and the required interaction, since we can't do a hard update after a failed run) in my fork and they seem to be working well. I'm running a few thousand batches to verify, and then plan to merge some of @romatthe's improvements before I submit a PR.
This concerns the SteamAppList projects, but I thought it was more appropriate to file the issue here
For the longest time, I thought my games not appearing in the list when opening the application was because the database had to be updated (via your SteamAppList and SteamAppListDumps projects). Because updating the database of games often takes a couple of hours, I usually just used SamReloaded by typing in a game's appid whenever a game was missing on the list (great feature btw).
However, I think the problem is that the script to build the database actually just doesn't work properly.
Feel free to correct me (as my JavaScript-fu is pretty terrible), but from what I understand, the process for updating the game database is as follows:
If this is correct, it doesn't work very well. There are two problematic cases:
What happens these cases is that the game gets added to the master list with the
achievements
field filled out asnull
.Here's what the example games look like in the database dump I generated a few hours ago:
Obviously, if you query Steam right now, you get a much different answer:
Paradise Killer:
https://store.steampowered.com/api/appdetails/?filters=basic,achievements&appids=1160220
:Atomic Heart:
https://store.steampowered.com/api/appdetails/?filters=basic,achievements&appids=668580
:However, because these games were originally written into the
app_list.json
with"achievements: null"
, and because games are never queried again once they are written down intoapp_list.json
, they will not appear in the list of games in SamRewritten, because they are never updated and shown as having achievements.Black Myth: Wukong is a little different of course, since it currently doesn't list any achievements:
https://store.steampowered.com/api/appdetails/?filters=basic,achievements&appids=2358720
:In this case, the absence of achievements is indeed correct. But that's just because the game hasn't been released yet. Odds are extremely high it will have achievements upon release. But again, this game will never be queried again, because it's already on the list.
If what I'm saying is correct, then the real question becomes, of course, what can be done about that? The easy solution would be NOT to write games with
"achievements: null"
intoapp_list.json
so that they get queried again anew every time a Steam app dump is taken. Or you could change to logic to query games even if they have"achievements: null"
.However, I think this is also pretty problematic. Let's do a quick search to see how many games Steam lists:
Now how many of those do NOT have achievements?
This means that you'd need to query over half of the entire list of Steam apps if you were to take a dump for the first time with the modification proposed above.
Now, of course, I'd imagine there would at least be several thousand games that would get filled up with the correct achievement count (examples Paradise Killer and Atomic Heart being among them), so next time you take the dump, the amount of games that would need to be queried would be less. But you'd still looking at several ten thousand games that don't have achievements and never will have any (and thus that will always be marked with
"achievements: null"
) that you'd be querying each and every time you take an app list dump. In other words, this would dramatically increase the time it takes to perform a new dump.So, the simple solution proposed above wouldn't be very practical I think, it may end up doubling or tripling the amount of time it takes to dump the entire database.
Any ideas? Did I get something wrong here?