Open osmanyucel opened 3 years ago
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
Anyway, a lot of data is currently available through barter's API. Coincidentally, I'm developing some visual analytics tools for barter.vg data for a uni project (which deadline is this sunday, btw). If you want, we can collab and develop something nice over the weekend. https://game-data-explorer.glitch.me/
We've had this discussion before multiple times and it comes up again every once in a while.
"Value" is not a hard and fast number that can be calculated; people take into account so many factors and may even determine value subjectively. Besides, your proposal doesn't address potential arguments such as "why is this counted/why is this not counted?" and the issue of calculations being potentially "wrong", causing annoyance for "veteran traders" and new users whose trades constantly get declined.
If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
That's a good point, I haven't considered that. I think for the proof of concept, we can assume the properties of the games are considered constant at the time of training. I know this is a big assumption, but for now I don't have a better solution.
Anyway, a lot of data is currently available through barter's API. Coincidentally, I'm developing some visual analytics tools for barter.vg data for a uni project (which deadline is this sunday, btw). If you want, we can collab and develop something nice over the weekend. https://game-data-explorer.glitch.me/
I will check the API and what can be done for the training.
We've had this discussion before multiple times and it comes up again every once in a while.
"Value" is not a hard and fast number that can be calculated; people take into account so many factors and may even determine value subjectively. Besides, your proposal doesn't address potential arguments such as "why is this counted/why is this not counted?" and the issue of calculations being potentially "wrong", causing annoyance for "veteran traders" and new users whose trades constantly get declined.
I am not sure what you mean by "why this is counted/not counted". About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground.
EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.
If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.
I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.
Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.
I am not sure what you mean by "why this is counted/not counted".
Someone could say "why is [metric I consider to be important in a trade] not counted in the 'BarterValue' calculator?", i.e. "why is the price of the game on my obscure store not counted?"
About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground.
EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.
If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.
I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.
It doesn't matter how "objective" your trade metric is. How each individual person values each individual item is sentimental, subjective and mutable. I can guarantee you that how you value your copy of Pesterquest is entirely different to how I or anyone else values theirs.
Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.
If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.
I am not sure what you mean by "why this is counted/not counted".
Someone could say "why is [metric I consider to be important in a trade] not counted in the 'BarterValue' calculator?", i.e. "why is the price of the game on my obscure store not counted?"
For the internal data, my approach to machine learning is, just feed all the data you have, and let the algorithm decide what is important and what is not.
For the data from external stores, I believe the same rule applies with some extra steps. If we can set the system in a way that introducing new data is easy, we can try their obscure store prices, and if Machine Learning algorithm says they are helpful, it is great, if it says they are not helpful, there is your answer to the people for not using their store.
About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground. EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.
If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.
I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.
It doesn't matter how "objective" your trade metric is. How each individual person values each individual item is sentimental, subjective and mutable. I can guarantee you that how you value your copy of Pesterquest is entirely different to how I or anyone else values theirs.
Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.
If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.
About the objectivity/subjectivity concern, I definitely agree with you that the values of games will change for every person. But, in my opinion same thing applies to the real world as well. For example the value of an apple changes in a person's perception based on: if they are a vegan or not, how hungry they are, if they are allergic to apples and so on... But that doesn't make having a market price on apples any less useful.
If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.
This just depends on how it's implemented. It sure can be implemented that way, but it can also be implemented in such a way it's only suggestive.
Relevant steam discussion: https://steamcommunity.com/groups/bartervg/discussions/0/405692758726921809
I started working on a simple model ( even if it ends up being thrown away, I am fine with it ) , but the data collection is quite slow since I am having to hit the API for every offer. Is there a bulk API, that I can get the offer details from?
We ran into this problem too. There's only these (using cdn domain):
https://bartervg.com/o/json/
https://bartervg.com/u/<USER_HEX_ID>/o/json/
https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
Which offers would you like, completed (312760) and declined (1367632) offers? Or declined due to a specific reason, such as not worth it to me
(299169)?
Do declined include countered?
Yes, remember #251? There are 130333 declined offers with the reason countered
.
Good. Datadump for accepted and declined trades would be nice, with the included games data included.
accepted and declined trades
Accepted
is a temporary status. It should either be completed or failed (or expired). I'm not sure how much of a difference it would make, but not all accepted offers lead to completed offers.
If I understand correctly, most declined offers should not be used as inputs. There isn't enough information if someone declines without a reason. There is the wrong information is someone declines due to already own or not longer have. These declines do not reflect the offer recipient's sense of value.
Silly example: You offer Cyberpunk 2077 in exchange for my The Haunted Island, a Frog Detective Game. I decline because, whoops, I forgot to update my tradable collection and no longer have The Haunted Island, a Frog Detective to trade. If the model uses this offer, it will naively compute that the value of The Haunted Island, a Frog Detective Game is greater than Cyberpunk 2077 (either to me, or more dangerously, in general).
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
Do you have anything to address this? I did ask you I think like a year ago to start logging this historical data. I assume that still has no happened yet?
We ran into this problem too. There's only these (using cdn domain):
- Most recent global trades:
https://bartervg.com/o/json/
- User trade list:
https://bartervg.com/u/<USER_HEX_ID>/o/json/
- User trade:
https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
I am currently going the path of
-> Collect all users from https://barter.vg/u/json
-> Collect all offer IDs from https://bartervg.com/u/<USER_HEX_ID>/o/json/
-> Collect all offer details from https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
Which offers would you like, completed (312760) and declined (1367632) offers? Or declined due to a specific reason, such as
not worth it to me
(299169)?
I am marking Completed
and 'Acceptedas positive samples, and
Declinedas negative samples. I guess I am at the point of
the more data, the better` point
accepted and declined trades
Accepted
is a temporary status. It should either be completed or failed (or expired). I'm not sure how much of a difference it would make, but not all accepted offers lead to completed offers.If I understand correctly, most declined offers should not be used as inputs. There isn't enough information if someone declines without a reason. There is the wrong information is someone declines due to already own or not longer have. These declines do not reflect the offer recipient's sense of value.
Silly example: You offer Cyberpunk 2077 in exchange for my The Haunted Island, a Frog Detective Game. I decline because, whoops, I forgot to update my tradable collection and no longer have The Haunted Island, a Frog Detective to trade. If the model uses this offer, it will naively compute that the value of The Haunted Island, a Frog Detective Game is greater than Cyberpunk 2077 (either to me, or more dangerously, in general).
I believe most people would just not set a reason because they are lazy. Even the declined offers without reasons give us a lot of information and they should be used. Ofcourse, what I believe is not very important, what we should do is try a few different methods (including and excluding the no-reason declines) to see how accurate our model gets, and make this call based on what the output of the model says. I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
Do you have anything to address this? I did ask you I think like a year ago to start logging this historical data. I assume that still has no happened yet?
I am seeing some of the game details in the offer detail api, but I didn't get to check if that data comes from the historical state of the game or the offer data is joined with the current state of the game data.
-> Collect all users from https://barter.vg/u/json -> Collect all offer IDs from
https://bartervg.com/u/<USER_HEX_ID>/o/json/
-> Collect all offer details fromhttps://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
I think you can minimize the data-collecting with logbase2 by skipping the same trades from the other side perspective.
I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.
Yes barter had a point. There are some 'invalid' decline reasons you should filter out, like no longer have
reason. Sure, ML algorithms may be resilient to outliers, but best is to clean the data to avoid any (slight) biases.
I am seeing some of the game details in the offer detail api, but I didn't get to check if that data comes from the historical state of the game or the offer data is joined with the current state of the game data.
A trade-off has to be made by either using less, but more recent/accurate data, or more with more outdated data.
-> Collect all users from https://barter.vg/u/json -> Collect all offer IDs from
https://bartervg.com/u/<USER_HEX_ID>/o/json/
-> Collect all offer details fromhttps://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
I think you can minimize the data-collecting with logbase2 by skipping the same trades from the other side perspective.
Yes, I am doing that, but I think it is not log2, it just halves the data collection.
I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.
Yes barter had a point. There are some 'invalid' decline reasons you should filter out, like
no longer have
reason. Sure, ML algorithms may be resilient to outliers, but best is to clean the data to avoid any (slight) biases.
That's right, but I still don't think that should be part of data dump. It should be data preprocessing step.
In my opinion even the offers which was rejected with reason "no longer have", carry some information. If I offer you Game A for Game B and you reject because "no longer have", by making the offer I am still creating a row in the dataset saying B is more valuable than A. Probably ignoring your rejection is a good idea, but we can still use the fact that I am offering.
I don't get how that makes B more valuable than A. With "no longer have" you don't if a trade offer would have been declined or accepted.
By making an offer to you and saying give me B and I will give you A, I am intrinsicly claiming B is more valuable than A (for me). When you reject because of "no longer have" we don't get any information from you which helps to compare values of A and B.
EDIT : If you check the equations from my initial proposal, for every offer we create 2 rows: 1 for the offerer's evaluation, and one for receiver's evaluation. In this case we still have the offerer's evaluation. But we won't be able to create the second equation, which is for receiver's evaluation.
By making an offer
Excellent point. This would mean that even expired offers could provide some information.
By making an offer
Excellent point. This would mean that even expired offers could provide some information.
Yes, though I have to admit that I didn't consider expired offers, that is a good catch. That is why I still think the data dump has to be as comprehensive as possible.
Also I have been running the data collection for over 16 hours now and I still didn't get to 100k offers. So it would be great to know if we will have some data dump soon. Otherwise getting all the offers will probably weeks/months for me.
Also I have been running the data collection for over 16 hours
There was a massive traffic spike around 10 hours ago. Right now though, no problems.
if we will have some data dump soon
What would that look like? There are ~3M offers, However, ~1M are cancelled, and I assume cancelled have no value and can be excluded. Therefore ~2M offers in this https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/
format? Combined into one big file to download?
I guess we can ignore cancelled. it can be one file, or if it is too large to handle, some partitioning would also work. Offers I have downloaded so far are about ~5kb/offer so I assume that the dump will be about 10gb.
P.S : I just introduced some parallelization to the code, and now it is going way, way much faster. I will let it run for an hour amd recalculate how long would it take to download all. So if it will take much time to create the data dump, don't bother at the moment.
Parallelized version of the data collector downloads ~100K offers an hour. So I should have the data ready by tomorrow (hopefully). I hope I am not creating spikes in your servers.
I am done downloading the offers ( ~1.8 million). Now I can work on them.
I am done downloading the offers ( ~1.8 million). Now I can work on them.
That should be all of the non-cancelled offers. Remember to set aside enough offers for the validation dataset since there isn't another batch of a million offers, at least not any time soon.
Alright here are the results for my initial, very weak model. I can explain the model in detail, but for now let me shortly explain how I made the model weak:
nan
values anywhere
Even with all these improvement opportunities, the results were not too bad.
I only used the numerical fields, which come from the offer api, and I may have filtered out some which may be useful. The accuracy of the model was 74%, which is not very impressive considering this is a binary label. I believe by solving the problems above we can significantly increase this.
EDIT: Note that the values are scaled with min-max scaler. So all the values are mapped to [0,1] range The weights found by the model are as follows, in order of positive contribution to the value: | Field | Weight | Interpretation |
---|---|---|---|
price_high | 12.965037 | This makes sense since more expensive games are more valuable | |
wishlist | 11.5525185 | This makes sense since games which are wanted by a lot of people are more valuable | |
user_reviews_total | 8.726534 | This makes sense since this is an indicator of world wide popularity and popular games are more valuable | |
library | 0.7305585 | Here the weights significantly drop. That makes sense since library gives us the popularity among Barter.VG users, and popularity is already covered by the previous variable. | |
price | 0.6202095 | price and highest price have a very high correlation. Since this value is covered by price high, the weight of price is not very high | |
user_reviews_positive | 0.0379135 | this is surprising for me. Apparently people don't care about the steam rating too much. ( may be it has a correlation with the review count? ) | |
is_free | -0.050759 | Apparently having a game marked as 'given away' at some point, doesn't have too much negative effect on the value (maybe the data was not very representative on that.) this is surprising. | |
cards | -0.19254 | I don't even know what cards mean :) but apparently having cards reduce a game's value | |
tradeable | -0.6832775 | This makes sense, since having more supply will drop the value of the game | |
achievements | -1.381961 | This is an interesting one. |
Finally, I calculated values of 3 games.
(I didn't calculate the values for all yet, but it shouldn't take too long).
(Just for the fun of it I assumed our currency is ਓ)
Game | Value |
---|---|
Cyberpunk 2077 | 2.74531363 ਓ |
Red Dead Redemption 2 | 2.28645338 ਓ |
The Haunted Island, a Frog Detective Game | 0.02537145 ਓ |
What do you guys think?
My theory for cards and achievements:
Due to the new Steam Learning system, mostly older games will have trading cards, and these games were severely overbundled in the heyday of bundles between 2015 and 2018 I guess. The same might be said for achievements, with several achievement printers being readily available and practically worthless because they are banned and/or very bad. Also, people, myself included, trade several cheap games with cards for unbundled games as some sellers farm the cards on their alt accounts and sell them for Steam wallet.
Great work so far! I'm really impressed and excited what this can do.
I think the library
field is very conflicting. Yes, it can indicate popularity, but that has also has a lot to do with accessibility, like low prices and bundles. Also, low library
value makes a game more rare, so more valuable too (for game collectors that have most popular games already), in that sense.
The weight for the is_free
is also surprising to me. In my experience, when I send a lot of offers with free games, I get a very low acceptance rate. Even when highly wanted games become given away, they're valued a lot less.
About correlations, you may find some results here: http://digital-game-trading.glitch.me/
Game Value Cyberpunk 2077 2.74531363 ਓ Red Dead Redemption 2 2.28645338 ਓ The Haunted Island, a Frog Detective Game 0.02537145 ਓ
What do you guys think?
[H] 109x The Haunted Island, a Frog Detective Game [W] Cyberpunk 2077 \ scientifically proven fair offer 😁
The accuracy of the model was 74%, which is not very impressive considering this is a binary label.
The label represents accept or decline? Would it affect accuracy if the distribution is not 50/50, but instead 80/20?
🤦♂️ Flagged for spreading statistical fallacies
My naive model would be a random number generator that predicted decline 80% of the time.
If the distribution were 80% declines, then a model that predicted declined every time would be 80% accurate
having a game marked as 'given away'
Where is is_free
from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)
Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.
Where is
is_free
from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)
Wouldn't it make more sense to just set the price
attribute as 0 then?
The given away metric is certainly an important metric to look at.
price
I don't know if it would make much of a difference, but I would have used price_low
instead. Price is the current Steam price at whatever day you're checking the API. price_low
is less dependent on when you check it, although as Revadike has reminded me often, this becomes less applicable the older the offer.
library ... popularity among Barter.VG users
As Revadike noted about "accessibility", given the userbase of the site, this is less a measure of "popularity", and more a measure of availability, how common the bundles and/or giveaways were that included the given game.
user_reviews_positive ... this is surprising for me
I think you could use something like SteamDB uses to modify the review score based on the number of reviews. A 100% positive with 10 reviews means a lot less than 92% with 1000 reviews.
but apparently having cards reduce a game's value
As Tecfan noted, highly likely this is correlational not casual. Adding cards to a game does not decrease the game's value.
achievements
Apologies if I missed the explanation of this, but is it a boolean or do you use the number of achievements? If so, Tecfan's explanation makes even more sense. Decent quality games don't have 12k achievements. The average number of achievements, for games with them, is 74.
Where is
is_free
from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)Wouldn't it make more sense to just set the
price
attribute as 0 then? The given away metric is certainly an important metric to look at.
https://github.com/bartervg/barter.vg/issues/238 f2p non-0 price issue
https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)
giveaway_count
or could even use steamgifts_cv
(or is it steamgifts_cv_points
)
Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.
Right after I posted, I had a feeling there was a fallacy. I edited my post and I hope it's clearer.
74% accuracy (if I understand this correctly) seems good if the distribution were 50% accepted and 50% declined. Being able to predict a coin flip scenario with better than 50% accuracy is better than chance. However, if the distribution were 20% accepted and 80% declined (which is a good approximation for non-cancelled offers), then a model that merely spit out declined
100% of the time would be 80% accurate. Yes?
if the distribution were 50% accepted and 50% declined
Yes, this is called unbalanced data, or balanced data depending case
The accuracy of the model was 74%, which is not very impressive considering this is a binary label.
The label represents accept or decline? Would it affect accuracy if the distribution is not 50/50, but instead 80/20? My naive model would be a random number generator that predicted decline 80% of the time.
label is an augmented version of Accept/Decline. I used the method I described in the initial proposal. So every offer I create one line for the offer and check if it is accepted/declined, and I created one inverted offer as always accepted.
I agree that distribution also affects and my data set has 69% accept already, so still not too impressive to get it up to 74%.
Just to be pedantic: if the distribution was 80-20% your naive model shouldn't be a random number generator that predicts decline 80% of the time (this model would have 68% accuracy), it should be predicting everything as decline, and getting 80% accuracy.
Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.
Yes, it is predicting 74% of unseen data correctly (using the augmented label I mentioned above).
having a game marked as 'given away'
Where is
is_free
from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)Wouldn't it make more sense to just set the
price
attribute as 0 then? The given away metric is certainly an important metric to look at.
I got is_free
from the offer API of barter.vg. I imagined it represents $0
tag we see on the website.I didn't focus too much on the meaning of data
part. But now that we have a promising start, we can start focusing on making the data, richer/cleaner/better/etc.
price
I don't know if it would make much of a difference, but I would have used
price_low
instead. Price is the current Steam price at whatever day you're checking the API.price_low
is less dependent on when you check it, although as Revadike has reminded me often, this becomes less applicable the older the offer.
now my goal is creating some joins first I just used he fields which come with the offer records from the offer api. One thing to consider is, I assumed that the values come from the point of time when the offer is made. If I join with the items table, we will be assuming a trade 5 years ago has happened with today's values.
library ... popularity among Barter.VG users
As Revadike noted about "accessibility", given the userbase of the site, this is less a measure of "popularity", and more a measure of availability, how common the bundles and/or giveaways were that included the given game.
library
was not completely useless in the output. But I may still say maybe popularity is covering most of what library
covers. Still library was a little bit important even by just explaining the remaining variance. I think I will check some correlation metric between the variables.
user_reviews_positive ... this is surprising for me
I think you could use something like SteamDB uses to modify the review score based on the number of reviews. A 100% positive with 10 reviews means a lot less than 92% with 1000 reviews.
but apparently having cards reduce a game's value
As Tecfan noted, highly likely this is correlational not casual. Adding cards to a game does not decrease the game's value.
Yes, I may have misstated my understanding. I meant the correlational outcome.
achievements
Apologies if I missed the explanation of this, but is it a boolean or do you use the number of achievements? If so, Tecfan's explanation makes even more sense. Decent quality games don't have 12k achievements. The average number of achievements, for games with them, is 74.
This makes sense. Then maybe some transformation on this field maybe useful.
Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.
Right after I posted, I had a feeling there was a fallacy. I edited my post and I hope it's clearer.
74% accuracy (if I understand this correctly) seems good if the distribution were 50% accepted and 50% declined. Being able to predict a coin flip scenario with better than 50% accuracy is better than chance. However, if the distribution were 20% accepted and 80% declined (which is a good approximation for non-cancelled offers), then a model that merely spit out
declined
100% of the time would be 80% accurate. Yes?
Yes you are right. ( i had pedantically explained it above :) ) I had started my explanation before I saw your last post with the correction.
This is embarrassing in light of Revadike's effort to document the APIs, is_free is a timestamp or a boolean (where 1 = true).
I imagined it represents $0 tag we see on the website
This is true, but there are other scenarios that may be more relevant. If the game is free on demand (is_free is true), then $0
appears on the offer page. In addition, $0
appears if keys were given away, and this would be reflected in giveaway_count
or the steamgifts cv (where 0 means it was given away).
It would be nice if you could make the model open source 😊
It would be nice if you could make the model open source
Right now, I am ashamed about the state of the code :D but I will clean it up and put it on github
This is embarrassing in light of Revadike's effort to document the APIs, is_free is a timestamp or a boolean (where 1 = true).
I imagined it represents $0 tag we see on the website
This is true, but there are other scenarios that may be more relevant. If the game is free on demand (is_free is true), then
$0
appears on the offer page. In addition,$0
appears if keys were given away, and this would be reflected ingiveaway_count
or the steamgifts cv (where 0 means it was given away).
I think the most important (and most limiting) assumption I made is that I assumed the item information here https://barter.vg/u/a0/o/1789670/json/ is from the time of the offer. is that correct? Otherwise I can just join the item id from the offer with the current game data, which gives us more data, but less accurate data. ( I will try this anyway )
That's an incorrect assumption and one I tried to address here
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
Also made an issue about a long time ago https://github.com/bartervg/barter.vg/issues/128
That's an incorrect assumption and one I tried to address here
I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.
Also made an issue about a long time ago #128
Hmm, when we discussed this, I assumed it was accurate but it was limited. Now I understand it is equal to me just getting the IDs and making the join myself. This is both good and bad news. It is bad news because it is not perfectly accurate. It is good because, now I can introduce much more data without feeling bad about destroying the time-based accuracy of the data.
I am embarrassed to say that I found I was making a mistake in the final value calculation. The actual values are supposed to be as follows:
Game | Value |
---|---|
Red Dead Redemption 2 | 13.603197 ਓ |
Cyberpunk 2077 | 14.942009 ਓ |
The Haunted Island, a Frog Detective Game | 0.012286 ਓ |
[H] 109x The Haunted Island, a Frog Detective Game [W] Cyberpunk 2077*
- scientifically proven fair offer
So now it will require at least 1216 copies of Haunted Island. Then again, these values are subject to change, so don't finalize your trade yet.
Again I want to emphasize that I am still cleaning up the code, and it is quite possible that I will find some more bugs or places that need change. So, don't get mad at me if I come up with more changes.
What problem does this feature address?
As a casual trader, it is difficult to assess an offer or compare game values. This causes problems in 2 ways:
The other way to get out of this problem is to learn how to value games as a trader, but that requires a lot of time investment.
Describe a solution
My proposal is to create a BarterValue for every game. While these values can be calculated by many machine learning approaches, my initial proposal is applying a simple Logistic Regression algorithm. Initial design can be found below:
For sake of simplicity I will use only 2 values per game in my description nW (number of wishlist) and nT (number of tradeable) and 2 values per trader nS (number of sent offers) and nR (number of received offers).
The assumption we follow is if user A offered game X for game Y to user B and the offer is accepted we interpret that as:
Value(X)>=Value(Y)
, since they offered the tradeValue(Y)>=Value(X)
, since they accepted the tradeOn the other hand, if user A offered game W for game Z to user B and the offer is rejected we interpret that as:
Value(W)>=Value(Z)
, since they offered the tradeValue(Z)<Value(W)
, since they rejected the tradeWith these assumptions we create data rows as
We then run a simple logistic regression. Just to give the equations for the rejected trade above:
In the equation above C1s is the coefficient assigned to number of sent offers for the sender, and C1r is the coefficient for the offers sent for the receiver. Since we want those coefficients to be almost the same, we can add a regularization step to make sure they dont drift too far apart.
As soon as we have our logistic regression trained and we have the coefficients, we can easily calculate the value of a game by using its features. For example the value of game X will become:
C2*nW(X)+C3*nT(X)
Having this feature would not only help people make easier evaluations, but also it can be used to generate most fair offers on BarterVG automatically.
Examples of similar features
Similar approaches have been used for predicting real estate prices.