bartervg / barter.vg

Track and hold discussion on Barter.vg bugs, enhancements, and other issues
https://barter.vg
MIT License
20 stars 4 forks source link

Introducing BarterValue for games #262

Open osmanyucel opened 3 years ago

osmanyucel commented 3 years ago

What problem does this feature address?

As a casual trader, it is difficult to assess an offer or compare game values. This causes problems in 2 ways:

The other way to get out of this problem is to learn how to value games as a trader, but that requires a lot of time investment.

Describe a solution

My proposal is to create a BarterValue for every game. While these values can be calculated by many machine learning approaches, my initial proposal is applying a simple Logistic Regression algorithm. Initial design can be found below:

For sake of simplicity I will use only 2 values per game in my description nW (number of wishlist) and nT (number of tradeable) and 2 values per trader nS (number of sent offers) and nR (number of received offers).

The assumption we follow is if user A offered game X for game Y to user B and the offer is accepted we interpret that as:

On the other hand, if user A offered game W for game Z to user B and the offer is rejected we interpret that as:

With these assumptions we create data rows as

nS(A),nR(A), nW(X), nT(X),nS(B),nR(B), nW(Y), nT(Y) -> TRUE
nS(B),nR(B), nW(Y), nT(Y),nS(A),nR(A), nW(X), nT(X) -> TRUE
nS(A),nR(A), nW(W), nT(W),nS(B),nR(B), nW(Z), nT(Z) -> TRUE
nS(B),nR(B), nW(Z), nT(Z),nS(A),nR(A), nW(W), nT(W) -> FALSE

We then run a simple logistic regression. Just to give the equations for the rejected trade above:

sig(C0+C1s*nS(A)+C2s*nR(A)+C3s*nW(W)+c4s*nT(W)- C1r*nS(B)-C2r* nR(B)-C3r*nW(Z)-C4r*nT(Z)) =1
sig(C0+C1s*nS(B)+C2s*nR(B)+C3s*nW(Z)+c4s*nT(Z)- C1r*nS(A)-C2r* nR(A)-C3r*nW(W)-C4r*nT(W)) =0

In the equation above C1s is the coefficient assigned to number of sent offers for the sender, and C1r is the coefficient for the offers sent for the receiver. Since we want those coefficients to be almost the same, we can add a regularization step to make sure they dont drift too far apart.

As soon as we have our logistic regression trained and we have the coefficients, we can easily calculate the value of a game by using its features. For example the value of game X will become: C2*nW(X)+C3*nT(X)

Having this feature would not only help people make easier evaluations, but also it can be used to generate most fair offers on BarterVG automatically.

Examples of similar features

Similar approaches have been used for predicting real estate prices.

Revadike commented 3 years ago

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

Anyway, a lot of data is currently available through barter's API. Coincidentally, I'm developing some visual analytics tools for barter.vg data for a uni project (which deadline is this sunday, btw). If you want, we can collab and develop something nice over the weekend. https://game-data-explorer.glitch.me/

antigravities commented 3 years ago

We've had this discussion before multiple times and it comes up again every once in a while.

"Value" is not a hard and fast number that can be calculated; people take into account so many factors and may even determine value subjectively. Besides, your proposal doesn't address potential arguments such as "why is this counted/why is this not counted?" and the issue of calculations being potentially "wrong", causing annoyance for "veteran traders" and new users whose trades constantly get declined.

If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.

osmanyucel commented 3 years ago

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

That's a good point, I haven't considered that. I think for the proof of concept, we can assume the properties of the games are considered constant at the time of training. I know this is a big assumption, but for now I don't have a better solution.

Anyway, a lot of data is currently available through barter's API. Coincidentally, I'm developing some visual analytics tools for barter.vg data for a uni project (which deadline is this sunday, btw). If you want, we can collab and develop something nice over the weekend. https://game-data-explorer.glitch.me/

I will check the API and what can be done for the training.

We've had this discussion before multiple times and it comes up again every once in a while.

"Value" is not a hard and fast number that can be calculated; people take into account so many factors and may even determine value subjectively. Besides, your proposal doesn't address potential arguments such as "why is this counted/why is this not counted?" and the issue of calculations being potentially "wrong", causing annoyance for "veteran traders" and new users whose trades constantly get declined.

I am not sure what you mean by "why this is counted/not counted". About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground.

EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.

If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.

I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.

Revadike commented 3 years ago

Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.

antigravities commented 3 years ago

I am not sure what you mean by "why this is counted/not counted".

Someone could say "why is [metric I consider to be important in a trade] not counted in the 'BarterValue' calculator?", i.e. "why is the price of the game on my obscure store not counted?"

About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground.

EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.

If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.

I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.

It doesn't matter how "objective" your trade metric is. How each individual person values each individual item is sentimental, subjective and mutable. I can guarantee you that how you value your copy of Pesterquest is entirely different to how I or anyone else values theirs.

Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.

If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.

osmanyucel commented 3 years ago

I am not sure what you mean by "why this is counted/not counted".

Someone could say "why is [metric I consider to be important in a trade] not counted in the 'BarterValue' calculator?", i.e. "why is the price of the game on my obscure store not counted?"

For the internal data, my approach to machine learning is, just feed all the data you have, and let the algorithm decide what is important and what is not.

For the data from external stores, I believe the same rule applies with some extra steps. If we can set the system in a way that introducing new data is easy, we can try their obscure store prices, and if Machine Learning algorithm says they are helpful, it is great, if it says they are not helpful, there is your answer to the people for not using their store.

About the veteran/novice users, a side product of our model will be the user properties and their behaviors. Those values can be used in a future iteration. About the subjectivity concern, since the model is trained using the accumlated trade data, it will be as objective as possible. The subjectivity will always be an aspect, but it is also an aspect in real world, but we should be able to get an objective value to get a common ground. EDIT : this will also pre-eliminate a lot of trades which are destined to be declined, so it will save a lot of annoyence to both new and veteran traders.

If you want to have a "value tracker", it should be at worst opt-in, or preferably via a userscript.

I work as a backend developer, so having this as an opt-in option or userscript is not an area that I am an expert on. But as my personal opinion the more common this feature is, the more useful it will be. Because the main reason I made this proposal is for getting a common ground for all the users to evaluate games.

It doesn't matter how "objective" your trade metric is. How each individual person values each individual item is sentimental, subjective and mutable. I can guarantee you that how you value your copy of Pesterquest is entirely different to how I or anyone else values theirs.

Re: @antigravities I think you should see it as a tool. It's equivalent to asking a third party trader what (s)he thinks of the trade. In this case, it's the ML model's opinion.

If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.

About the objectivity/subjectivity concern, I definitely agree with you that the values of games will change for every person. But, in my opinion same thing applies to the real world as well. For example the value of an apple changes in a person's perception based on: if they are a vegan or not, how hungry they are, if they are allergic to apples and so on... But that doesn't make having a market price on apples any less useful.

Revadike commented 3 years ago

If it's built in to the site, it will be seen by new users as the "objective" and "only" way to determine game value, regardless of what the intention is.

This just depends on how it's implemented. It sure can be implemented that way, but it can also be implemented in such a way it's only suggestive.

Revadike commented 3 years ago

Relevant steam discussion: https://steamcommunity.com/groups/bartervg/discussions/0/405692758726921809

osmanyucel commented 3 years ago

I started working on a simple model ( even if it ends up being thrown away, I am fine with it ) , but the data collection is quite slow since I am having to hit the API for every offer. Is there a bulk API, that I can get the offer details from?

Revadike commented 3 years ago

We ran into this problem too. There's only these (using cdn domain):

bartervg commented 3 years ago

Which offers would you like, completed (312760) and declined (1367632) offers? Or declined due to a specific reason, such as not worth it to me (299169)?

Revadike commented 3 years ago

Do declined include countered?

bartervg commented 3 years ago

Yes, remember #251? There are 130333 declined offers with the reason countered.

Revadike commented 3 years ago

Good. Datadump for accepted and declined trades would be nice, with the included games data included.

bartervg commented 3 years ago

accepted and declined trades

Accepted is a temporary status. It should either be completed or failed (or expired). I'm not sure how much of a difference it would make, but not all accepted offers lead to completed offers.

If I understand correctly, most declined offers should not be used as inputs. There isn't enough information if someone declines without a reason. There is the wrong information is someone declines due to already own or not longer have. These declines do not reflect the offer recipient's sense of value.

Silly example: You offer Cyberpunk 2077 in exchange for my The Haunted Island, a Frog Detective Game. I decline because, whoops, I forgot to update my tradable collection and no longer have The Haunted Island, a Frog Detective to trade. If the model uses this offer, it will naively compute that the value of The Haunted Island, a Frog Detective Game is greater than Cyberpunk 2077 (either to me, or more dangerously, in general).

Revadike commented 3 years ago

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

Do you have anything to address this? I did ask you I think like a year ago to start logging this historical data. I assume that still has no happened yet?

bartervg commented 3 years ago

https://github.com/bartervg/barter.vg/issues/128

osmanyucel commented 3 years ago

We ran into this problem too. There's only these (using cdn domain):

  • Most recent global trades: https://bartervg.com/o/json/
  • User trade list: https://bartervg.com/u/<USER_HEX_ID>/o/json/
  • User trade: https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/

I am currently going the path of -> Collect all users from https://barter.vg/u/json -> Collect all offer IDs from https://bartervg.com/u/<USER_HEX_ID>/o/json/ -> Collect all offer details from https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/

Which offers would you like, completed (312760) and declined (1367632) offers? Or declined due to a specific reason, such as not worth it to me (299169)?

I am marking Completed and 'Acceptedas positive samples, andDeclinedas negative samples. I guess I am at the point ofthe more data, the better` point

accepted and declined trades

Accepted is a temporary status. It should either be completed or failed (or expired). I'm not sure how much of a difference it would make, but not all accepted offers lead to completed offers.

If I understand correctly, most declined offers should not be used as inputs. There isn't enough information if someone declines without a reason. There is the wrong information is someone declines due to already own or not longer have. These declines do not reflect the offer recipient's sense of value.

Silly example: You offer Cyberpunk 2077 in exchange for my The Haunted Island, a Frog Detective Game. I decline because, whoops, I forgot to update my tradable collection and no longer have The Haunted Island, a Frog Detective to trade. If the model uses this offer, it will naively compute that the value of The Haunted Island, a Frog Detective Game is greater than Cyberpunk 2077 (either to me, or more dangerously, in general).

I believe most people would just not set a reason because they are lazy. Even the declined offers without reasons give us a lot of information and they should be used. Ofcourse, what I believe is not very important, what we should do is try a few different methods (including and excluding the no-reason declines) to see how accurate our model gets, and make this call based on what the output of the model says. I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

Do you have anything to address this? I did ask you I think like a year ago to start logging this historical data. I assume that still has no happened yet?

I am seeing some of the game details in the offer detail api, but I didn't get to check if that data comes from the historical state of the game or the offer data is joined with the current state of the game data.

Revadike commented 3 years ago

-> Collect all users from https://barter.vg/u/json -> Collect all offer IDs from https://bartervg.com/u/<USER_HEX_ID>/o/json/ -> Collect all offer details from https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/

I think you can minimize the data-collecting with logbase2 by skipping the same trades from the other side perspective.

Revadike commented 3 years ago

I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.

Yes barter had a point. There are some 'invalid' decline reasons you should filter out, like no longer have reason. Sure, ML algorithms may be resilient to outliers, but best is to clean the data to avoid any (slight) biases.

Revadike commented 3 years ago

I am seeing some of the game details in the offer detail api, but I didn't get to check if that data comes from the historical state of the game or the offer data is joined with the current state of the game data.

A trade-off has to be made by either using less, but more recent/accurate data, or more with more outdated data.

osmanyucel commented 3 years ago

-> Collect all users from https://barter.vg/u/json -> Collect all offer IDs from https://bartervg.com/u/<USER_HEX_ID>/o/json/ -> Collect all offer details from https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/

I think you can minimize the data-collecting with logbase2 by skipping the same trades from the other side perspective.

Yes, I am doing that, but I think it is not log2, it just halves the data collection.

I agree that there can be some cases where the decline doesn't make any sense (such as your example) but the ML algorithms should be strong enough to be not fooled by those, as long as we have enough data to realize that it was an outlier.

Yes barter had a point. There are some 'invalid' decline reasons you should filter out, like no longer have reason. Sure, ML algorithms may be resilient to outliers, but best is to clean the data to avoid any (slight) biases.

That's right, but I still don't think that should be part of data dump. It should be data preprocessing step.

In my opinion even the offers which was rejected with reason "no longer have", carry some information. If I offer you Game A for Game B and you reject because "no longer have", by making the offer I am still creating a row in the dataset saying B is more valuable than A. Probably ignoring your rejection is a good idea, but we can still use the fact that I am offering.

Revadike commented 3 years ago

I don't get how that makes B more valuable than A. With "no longer have" you don't if a trade offer would have been declined or accepted.

osmanyucel commented 3 years ago

By making an offer to you and saying give me B and I will give you A, I am intrinsicly claiming B is more valuable than A (for me). When you reject because of "no longer have" we don't get any information from you which helps to compare values of A and B.

EDIT : If you check the equations from my initial proposal, for every offer we create 2 rows: 1 for the offerer's evaluation, and one for receiver's evaluation. In this case we still have the offerer's evaluation. But we won't be able to create the second equation, which is for receiver's evaluation.

bartervg commented 3 years ago

By making an offer

Excellent point. This would mean that even expired offers could provide some information.

osmanyucel commented 3 years ago

By making an offer

Excellent point. This would mean that even expired offers could provide some information.

Yes, though I have to admit that I didn't consider expired offers, that is a good catch. That is why I still think the data dump has to be as comprehensive as possible.

Also I have been running the data collection for over 16 hours now and I still didn't get to 100k offers. So it would be great to know if we will have some data dump soon. Otherwise getting all the offers will probably weeks/months for me.

bartervg commented 3 years ago

Also I have been running the data collection for over 16 hours

There was a massive traffic spike around 10 hours ago. Right now though, no problems.

if we will have some data dump soon

What would that look like? There are ~3M offers, However, ~1M are cancelled, and I assume cancelled have no value and can be excluded. Therefore ~2M offers in this https://bartervg.com/u/<USER_HEX_ID>/o/<TRADE_OFFER_ID>/json/ format? Combined into one big file to download?

osmanyucel commented 3 years ago

I guess we can ignore cancelled. it can be one file, or if it is too large to handle, some partitioning would also work. Offers I have downloaded so far are about ~5kb/offer so I assume that the dump will be about 10gb.

P.S : I just introduced some parallelization to the code, and now it is going way, way much faster. I will let it run for an hour amd recalculate how long would it take to download all. So if it will take much time to create the data dump, don't bother at the moment.

osmanyucel commented 3 years ago

Parallelized version of the data collector downloads ~100K offers an hour. So I should have the data ready by tomorrow (hopefully). I hope I am not creating spikes in your servers.

osmanyucel commented 3 years ago

I am done downloading the offers ( ~1.8 million). Now I can work on them.

bartervg commented 3 years ago

I am done downloading the offers ( ~1.8 million). Now I can work on them.

That should be all of the non-cancelled offers. Remember to set aside enough offers for the validation dataset since there isn't another batch of a million offers, at least not any time soon.

osmanyucel commented 3 years ago

Alright here are the results for my initial, very weak model. I can explain the model in detail, but for now let me shortly explain how I made the model weak:

Even with all these improvement opportunities, the results were not too bad.

Results

I only used the numerical fields, which come from the offer api, and I may have filtered out some which may be useful. The accuracy of the model was 74%, which is not very impressive considering this is a binary label. I believe by solving the problems above we can significantly increase this.

EDIT: Note that the values are scaled with min-max scaler. So all the values are mapped to [0,1] range The weights found by the model are as follows, in order of positive contribution to the value: Field Weight Interpretation
price_high 12.965037 This makes sense since more expensive games are more valuable
wishlist 11.5525185 This makes sense since games which are wanted by a lot of people are more valuable
user_reviews_total 8.726534 This makes sense since this is an indicator of world wide popularity and popular games are more valuable
library 0.7305585 Here the weights significantly drop. That makes sense since library gives us the popularity among Barter.VG users, and popularity is already covered by the previous variable.
price 0.6202095 price and highest price have a very high correlation. Since this value is covered by price high, the weight of price is not very high
user_reviews_positive 0.0379135 this is surprising for me. Apparently people don't care about the steam rating too much. ( may be it has a correlation with the review count? )
is_free -0.050759 Apparently having a game marked as 'given away' at some point, doesn't have too much negative effect on the value (maybe the data was not very representative on that.) this is surprising.
cards -0.19254 I don't even know what cards mean :) but apparently having cards reduce a game's value
tradeable -0.6832775 This makes sense, since having more supply will drop the value of the game
achievements -1.381961 This is an interesting one.

Finally, I calculated values of 3 games.

(I didn't calculate the values for all yet, but it shouldn't take too long).

(Just for the fun of it I assumed our currency is ਓ)

Game Value
Cyberpunk 2077 2.74531363 ਓ
Red Dead Redemption 2 2.28645338 ਓ
The Haunted Island, a Frog Detective Game 0.02537145 ਓ

What do you guys think?

Tecfan commented 3 years ago

My theory for cards and achievements:

Due to the new Steam Learning system, mostly older games will have trading cards, and these games were severely overbundled in the heyday of bundles between 2015 and 2018 I guess. The same might be said for achievements, with several achievement printers being readily available and practically worthless because they are banned and/or very bad. Also, people, myself included, trade several cheap games with cards for unbundled games as some sellers farm the cards on their alt accounts and sell them for Steam wallet.

Revadike commented 3 years ago

Great work so far! I'm really impressed and excited what this can do.

I think the library field is very conflicting. Yes, it can indicate popularity, but that has also has a lot to do with accessibility, like low prices and bundles. Also, low library value makes a game more rare, so more valuable too (for game collectors that have most popular games already), in that sense.

The weight for the is_free is also surprising to me. In my experience, when I send a lot of offers with free games, I get a very low acceptance rate. Even when highly wanted games become given away, they're valued a lot less.

About correlations, you may find some results here: http://digital-game-trading.glitch.me/

bartervg commented 3 years ago

Game Value Cyberpunk 2077 2.74531363 ਓ Red Dead Redemption 2 2.28645338 ਓ The Haunted Island, a Frog Detective Game 0.02537145 ਓ

What do you guys think?

[H] 109x The Haunted Island, a Frog Detective Game [W] Cyberpunk 2077 \ scientifically proven fair offer 😁

bartervg commented 3 years ago

The accuracy of the model was 74%, which is not very impressive considering this is a binary label.

The label represents accept or decline? Would it affect accuracy if the distribution is not 50/50, but instead 80/20? 🤦‍♂️ Flagged for spreading statistical fallacies My naive model would be a random number generator that predicted decline 80% of the time. If the distribution were 80% declines, then a model that predicted declined every time would be 80% accurate

having a game marked as 'given away'

Where is is_free from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)

Revadike commented 3 years ago

Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.

Revadike commented 3 years ago

Where is is_free from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)

Wouldn't it make more sense to just set the price attribute as 0 then? The given away metric is certainly an important metric to look at.

bartervg commented 3 years ago

price

I don't know if it would make much of a difference, but I would have used price_low instead. Price is the current Steam price at whatever day you're checking the API. price_low is less dependent on when you check it, although as Revadike has reminded me often, this becomes less applicable the older the offer.

library ... popularity among Barter.VG users

As Revadike noted about "accessibility", given the userbase of the site, this is less a measure of "popularity", and more a measure of availability, how common the bundles and/or giveaways were that included the given game.

user_reviews_positive ... this is surprising for me

I think you could use something like SteamDB uses to modify the review score based on the number of reviews. A 100% positive with 10 reviews means a lot less than 92% with 1000 reviews.

but apparently having cards reduce a game's value

As Tecfan noted, highly likely this is correlational not casual. Adding cards to a game does not decrease the game's value.

achievements

Apologies if I missed the explanation of this, but is it a boolean or do you use the number of achievements? If so, Tecfan's explanation makes even more sense. Decent quality games don't have 12k achievements. The average number of achievements, for games with them, is 74.

bartervg commented 3 years ago

Where is is_free from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)

Wouldn't it make more sense to just set the price attribute as 0 then? The given away metric is certainly an important metric to look at.

https://github.com/bartervg/barter.vg/issues/238 f2p non-0 price issue

https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1) giveaway_count or could even use steamgifts_cv (or is it steamgifts_cv_points)

bartervg commented 3 years ago

Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.

Right after I posted, I had a feeling there was a fallacy. I edited my post and I hope it's clearer.

74% accuracy (if I understand this correctly) seems good if the distribution were 50% accepted and 50% declined. Being able to predict a coin flip scenario with better than 50% accuracy is better than chance. However, if the distribution were 20% accepted and 80% declined (which is a good approximation for non-cancelled offers), then a model that merely spit out declined 100% of the time would be 80% accurate. Yes?

Revadike commented 3 years ago

if the distribution were 50% accepted and 50% declined

Yes, this is called unbalanced data, or balanced data depending case

osmanyucel commented 3 years ago

The accuracy of the model was 74%, which is not very impressive considering this is a binary label.

The label represents accept or decline? Would it affect accuracy if the distribution is not 50/50, but instead 80/20? My naive model would be a random number generator that predicted decline 80% of the time.

label is an augmented version of Accept/Decline. I used the method I described in the initial proposal. So every offer I create one line for the offer and check if it is accepted/declined, and I created one inverted offer as always accepted.

I agree that distribution also affects and my data set has 69% accept already, so still not too impressive to get it up to 74%.

Just to be pedantic: if the distribution was 80-20% your naive model shouldn't be a random number generator that predicts decline 80% of the time (this model would have 68% accuracy), it should be predicting everything as decline, and getting 80% accuracy.

Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.

Yes, it is predicting 74% of unseen data correctly (using the augmented label I mentioned above).

having a game marked as 'given away'

Where is is_free from? On the Steam API and (if correctly implemented) on Barter.vg, this means that the game is free on demand or no cost and this does not reflect key giveaways. It's TODO on the API wiki https://github.com/bartervg/barter.vg/wiki/Get-Item-(v1)

Wouldn't it make more sense to just set the price attribute as 0 then? The given away metric is certainly an important metric to look at.

I got is_free from the offer API of barter.vg. I imagined it represents $0 tag we see on the website.I didn't focus too much on the meaning of data part. But now that we have a promising start, we can start focusing on making the data, richer/cleaner/better/etc.

price

I don't know if it would make much of a difference, but I would have used price_low instead. Price is the current Steam price at whatever day you're checking the API. price_low is less dependent on when you check it, although as Revadike has reminded me often, this becomes less applicable the older the offer.

now my goal is creating some joins first I just used he fields which come with the offer records from the offer api. One thing to consider is, I assumed that the values come from the point of time when the offer is made. If I join with the items table, we will be assuming a trade 5 years ago has happened with today's values.

library ... popularity among Barter.VG users

As Revadike noted about "accessibility", given the userbase of the site, this is less a measure of "popularity", and more a measure of availability, how common the bundles and/or giveaways were that included the given game.

library was not completely useless in the output. But I may still say maybe popularity is covering most of what library covers. Still library was a little bit important even by just explaining the remaining variance. I think I will check some correlation metric between the variables.

user_reviews_positive ... this is surprising for me

I think you could use something like SteamDB uses to modify the review score based on the number of reviews. A 100% positive with 10 reviews means a lot less than 92% with 1000 reviews.

but apparently having cards reduce a game's value

As Tecfan noted, highly likely this is correlational not casual. Adding cards to a game does not decrease the game's value.

Yes, I may have misstated my understanding. I meant the correlational outcome.

achievements

Apologies if I missed the explanation of this, but is it a boolean or do you use the number of achievements? If so, Tecfan's explanation makes even more sense. Decent quality games don't have 12k achievements. The average number of achievements, for games with them, is 74.

This makes sense. Then maybe some transformation on this field maybe useful.

Accuracy in this case means 74% of unseen cases (new data) has its value correctly predicted, I believe.

Right after I posted, I had a feeling there was a fallacy. I edited my post and I hope it's clearer.

74% accuracy (if I understand this correctly) seems good if the distribution were 50% accepted and 50% declined. Being able to predict a coin flip scenario with better than 50% accuracy is better than chance. However, if the distribution were 20% accepted and 80% declined (which is a good approximation for non-cancelled offers), then a model that merely spit out declined 100% of the time would be 80% accurate. Yes?

Yes you are right. ( i had pedantically explained it above :) ) I had started my explanation before I saw your last post with the correction.

bartervg commented 3 years ago

This is embarrassing in light of Revadike's effort to document the APIs, is_free is a timestamp or a boolean (where 1 = true).

I imagined it represents $0 tag we see on the website

This is true, but there are other scenarios that may be more relevant. If the game is free on demand (is_free is true), then $0 appears on the offer page. In addition, $0 appears if keys were given away, and this would be reflected in giveaway_count or the steamgifts cv (where 0 means it was given away).

Revadike commented 3 years ago

It would be nice if you could make the model open source 😊

osmanyucel commented 3 years ago

It would be nice if you could make the model open source

Right now, I am ashamed about the state of the code :D but I will clean it up and put it on github

osmanyucel commented 3 years ago

This is embarrassing in light of Revadike's effort to document the APIs, is_free is a timestamp or a boolean (where 1 = true).

I imagined it represents $0 tag we see on the website

This is true, but there are other scenarios that may be more relevant. If the game is free on demand (is_free is true), then $0 appears on the offer page. In addition, $0 appears if keys were given away, and this would be reflected in giveaway_count or the steamgifts cv (where 0 means it was given away).

I think the most important (and most limiting) assumption I made is that I assumed the item information here https://barter.vg/u/a0/o/1789670/json/ is from the time of the offer. is that correct? Otherwise I can just join the item id from the offer with the current game data, which gives us more data, but less accurate data. ( I will try this anyway )

Revadike commented 3 years ago

That's an incorrect assumption and one I tried to address here

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

Also made an issue about a long time ago https://github.com/bartervg/barter.vg/issues/128

osmanyucel commented 3 years ago

That's an incorrect assumption and one I tried to address here

I had a similar idea: using machine learning, predict how likely a proposed trade is going to be accepted. The problem for that though, is there is no historical game data with past trades. Meaning the value/properties of a game in a past trade can be changed by now.

Also made an issue about a long time ago #128

Hmm, when we discussed this, I assumed it was accurate but it was limited. Now I understand it is equal to me just getting the IDs and making the join myself. This is both good and bad news. It is bad news because it is not perfectly accurate. It is good because, now I can introduce much more data without feeling bad about destroying the time-based accuracy of the data.

osmanyucel commented 3 years ago

I am embarrassed to say that I found I was making a mistake in the final value calculation. The actual values are supposed to be as follows:

Game Value
Red Dead Redemption 2 13.603197 ਓ
Cyberpunk 2077 14.942009 ਓ
The Haunted Island, a Frog Detective Game 0.012286 ਓ

[H] 109x The Haunted Island, a Frog Detective Game [W] Cyberpunk 2077*

  • scientifically proven fair offer

So now it will require at least 1216 copies of Haunted Island. Then again, these values are subject to change, so don't finalize your trade yet.

Again I want to emphasize that I am still cleaning up the code, and it is quite possible that I will find some more bugs or places that need change. So, don't get mad at me if I come up with more changes.